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Motion  Coordination  and  Adaptation  Using  Deception  and  Human 

Interactions 

FINAL  REPORT,  OCT.  2016 


Magnus  Egerstedt  and  Panagiotis  Tsiotras 


This  project  focuses  on  the  development  of  fundamental  new  tools  and  techniques  for  how  to 
structure  the  coordination  and  control  strategies  in  teams  of  mobile  robots.  In  particular,  two 
general  thrust  areas  are  pursued,  focusing  on  human-swarm  interactions  and  pursuit-evasion-based 
motion  control  strategies.  Although  interesting  in  their  own  rights,  the  unifying  theme  behind 
these  two  different  thrusts  is  the  notion  of  intent,  where  the  first  thrust,  which  can  be  thought  of  as 
evolving  at  a  higher  level  of  abstraction,  focuses  on  how  user  intent  can  be  injected  into  a  network 
of  mobile  agents  in  a  fundamentally  sound  manner.  The  second  thrust,  in  turn,  focuses  on  how 
the  intent  can  be  hidden  in  order  to  produce  effective,  deception-based  coordination  and  pursuit 
strategies. 


1  Summary  of  Accomplishments 


During  the  evolution  of  this  project,  a  number  of  different  results  and  directions  have  been  pursued, 

and  we  here  briefly  summarize  these  and  then  discuss  them  in  greater  detail  in  subsequent  sections. 

a)  How  can  effective  interaction  abstractions  be  established  for  teams  of  mobile  robots  that  allow 
users  to  instantaneously  effect  the  operation  of  the  team?  We  developed  a  novel  notion  of  ma- 
nipulability  for  defining  this  abstraction.  Manipulability  was  furthermore  used  to  define  haptic 
interactions  that  can  be  defined  for  enabling  single  operators  to  control  and  interact  with  teams 
of  mobile  robots.  Since  there  is  no  unique  nor  canonical  mapping  from  the  swarm  configuration 
to  the  forces  experienced  by  the  operator,  multi-agent  manipulability  was  selected  as  the  can¬ 
didate  mapping,  whereby  the  forces  experienced  by  the  operator  relate  to  how  directly  input 
directions,  injected  at  a  leader  agent  in  the  network,  translate  to  motions  of  the  follower  agents. 
Experimental  results  support  the  efficacy  of  the  proposed,  haptic,  human-swarm  interaction 
mapping,  through  user  studies  where  operators  are  tasked  with  driving  a  collection  of  robots 
through  a  series  of  way  points. 

b)  In  order  to  establish  what,  ultimately,  constitutes  effective  human-robot-team  interactions,  we 
need  formal  guarantees  for  whether  or  not  a  given  human-swarm  interaction  (HSI)  is  appropriate 
for  achieving  multi-robot  tasks.  Examples  of  such  tasks  include  guiding  a  swarm  of  robots  into 
a  particular  geometric  configuration.  In  doing  so,  we  define  what  it  means  to  impose  a  HSI 
structure  on  a  multi-robot  system.  Several  examples  of  multi-robot  systems  with  unique  HSI 
structures  have  been  considered  to  illustrated  the  viability  of  the  proposed  approach. 

c)  One  of  the  main  challenges  in  human-swarm  interactions  is  the  construction  of  suitable  abstrac¬ 
tions  that  make  an  entire  robot  team  amenable  to  human  control.  For  such  abstractions  to 
be  useful,  they  need  to  scale  gracefully  as  the  number  of  robots  increases.  In  this  work,  we 
consider  the  use  of  time-varying  density  functions  to  externally  influence  a  robot  swarm.  Den¬ 
sity  functions  abstract  away  the  size  of  the  robot  team  and  describe  instead  the  concentration 
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of  agents  over  the  domain  of  interest.  This  allows  a  human  operator  to  design  densities  so  as 
to  manipulate  the  robot  swarm  as  a  whole,  instead  of  at  the  individual  robot  level.  We  have 
developed  coverage  of  time- varying  density  functions  as  a  mechanism  to  translate  densities  into 
robotic  movement,  and  provided  a  series  of  control  laws  that  guarantee  optimal  coverage  by  the 
robot  team. 

d)  We  considered  the  following  differential  game  of  pursuit  and  evasion  involving  two  participating 
players:  an  evader,  which  has  limited  maneuverability,  and  an  agile  pursuer.  The  agents  move 
on  the  Euclidean  plane  with  different  but  constant  speeds.  Whereas  the  pursuer  can  change 
the  orientation  of  its  velocity  vector  arbitrarily  fast,  the  evader  cannot  make  turns  having 
a  radius  smaller  than  a  specified  minimum  turning  radius.  This  problem  can  be  seen  as  a 
reversed  Homicidal  Chauffeur  game,  hence  the  name  “Suicidal  Pedestrian  Differential  Game.” 
We  derived  the  optimal  strategies  of  the  two  players  and  characterized  the  initial  conditions  that 
lead  to  capture  if  the  pursuer  acts  optimally,  and  areas  that  guarantee  evasion  regardless  of  the 
pursuer’s  strategy.  Both  proximity-capture  and  point-capture  were  considered.  After  applying 
the  optimal  strategy  for  the  evader,  it  was  shown  that  the  case  of  point-capture  reduces  to  a 
special  version  of  Zermelo’s  Navigation  Problem  (ZNP)  for  the  pursuer.  The  ZNP  is  a  classical 
problem  of  finding  minimum-time  optimal  paths  in  the  presence  of  flow  fields.  Therefore,  the 
well-known  ZNP  solution  can  be  used  to  validate  the  results  obtained  through  the  differential 
game  framework,  as  well  as  to  characterize  the  time-optimal  trajectories. 

Having  derived  the  optimal  strategies  of  the  two  players  and  characterized  the  initial  conditions 
that  lead  to  capture  if  the  pursuer  acts  optimally,  and  areas  that  guarantee  evasion  regardless 
of  the  pursuer’s  strategy,  we  then  focused  our  efforts  on  its  multi-player  extension.  Namely, 
utilizing  the  connection  to  Zermelo’s  Navigation  Problem,  we  were  able  to  generalize  the  two- 
player  solution  under  some  assumptions  for  the  case  of  multiple  pursuer  UAVs. 

e)  We  considered  a  relay  pursuit-evasion  problem  with  two  pursuers  and  one  evader,  where  only  one 
of  the  pursuers  is  active  in  pursuit  of  the  evader  at  each  instant  of  time.  We  reduced  the  problem 
to  a  one-pur  suer /one-evader  problem  subject  to  a  state  constraint  and  derived  the  optimal 
control  strategy  of  the  evader  to  maximize  the  capture  time.  We  also  proposed  a  suboptimal,  yet 
practical,  control  strategy  for  the  evader  to  prolong  capture  that  does  not  require  the  solution 
of  the  corresponding  two-point  boundary-value  optimal  control  problem.  Extensions  to  the 
multiple-pursuer /one-evader  case  were  also  presented  and  evaluated  via  numerical  simulations. 

f)  We  investigated  solutions  of  a  pursuit-evasion  game  of  two  players  in  the  presence  of  an  external 
flow  held.  The  external  how  held  is  approximated  by  a  time-invariant  affine  function.  By 
utilizing  standard  techniques  from  differential  game  theory,  we  characterized  the  regions  of 
initial  conditions  that  lead  to  capture,  as  well  as  the  regions  that  result  in  evasion  when  the  two 
players  act  optimally.  We  derived  the  optimal  strategies  of  the  pursuer  and  the  evader  within 
the  capture  regions.  We  showed  that  in  the  presence  of  an  external  how  held,  the  player  with  an 
accurate  knowledge  of  the  held  has  an  advantage.  Finally,  we  presented  numerical  simulations  of 
the  resulting  pursuer  and  evader  trajectories  for  several  values  of  the  parameters  of  the  external 
how  held. 

g)  We  dealt  with  the  problem  of  a  team  of  pursuers  distributed  in  the  plane  subject  to  an  environ¬ 
mental  disturbance  (e.g.,  winds).  The  objective  of  the  pursuers  is  to  intercept  a  moving  target 
which  is  not  affected  by  the  presence  of  the  disturbance.  We  solved  this  problem  by  assigning 
only  one  pursuer  to  chase  the  target  at  every  instant  of  time,  based  on  a  Voronoi-like  partition 
of  the  plane.  During  the  pursuit,  the  pursuer  assignment  changes  dynamically  based  on  this 
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partition.  We  presented  an  algorithm  to  efficiently  update  this  Voronoi-like  partition  on-line 
and  validated  the  theoretical  results  through  numerical  simulations. 

h)  We  adopted  a  reachability-based  approach  to  deal  with  the  pursuit-evasion  differential  game 
between  two  players  in  the  presence  of  dynamic  environmental  disturbances  (e.g.,  winds,  sea 
currents).  We  gave  conditions  for  the  game  to  terminate  in  terms  of  reachable  set  inclusions. 
Level  set  equations  were  defined  and  solved  to  generate  the  reachable  sets  of  the  pursuer  and 
the  evader.  The  corresponding  time-optimal  trajectories  and  optimal  strategies  can  be  retrieved 
immediately  afterwards.  We  validated  our  method  by  applying  it  to  a  pursuit-evasion  game 
in  a  simple  flow  field,  for  which  an  analytical  solution  is  available.  We  also  implemented  the 
proposed  scheme  to  a  problem  with  a  more  realistic  flow  field. 

i)  We  derived  the  min-max  Differential  Dynamic  Programming  (GT-DDP)  algorithm  for  solving  a 
large  class  of  differential  games  in  continuous  time.  We  provided  a  set  of  backward  differential 
equations  for  the  value  function  expansion  without  assuming  closeness  of  the  initial  nominal 
control  to  the  optimal  control  solution,  and  derived  the  update  law  for  the  controls.  We  intro¬ 
duced  the  GT-DDP  algorithm  and  analyzed  the  effect  of  the  game  theoretic  formulation  in  the 
feed-forward  and  feedback  parts  of  the  control  policies. 

j)  We  developed  a  sampling-based  algorithm  designed  to  solve  differential  game  and  risk-sensitive 
stochastic  optimal  control  problems.  T  he  cornerstone  of  the  proposed  approach  is  the  formu¬ 
lation  of  the  problem  in  terms  of  forward  and  backward  stochastic  differential  equations  (FBS- 
DEs).  By  means  of  a  nonlinear  version  of  the  Feynman-Kac  lemma,  we  obtained  a  probabilistic 
representation  of  the  solution  to  the  nonlinear  Hamilton- Jacobi- Isaacs  equation,  expressed  in 
the  form  of  a  decoupled  system  of  FBSDEs.  This  system  of  FBSDEs  can  then  be  simulated  by 
employing  linear  regression  techniques.  Utilizing  the  connection  between  stochastic  differential 
games  and  risk-sensitive  optimal  control,  we  demonstrate  that  the  proposed  algorithm  is  also 
applicable  to  the  latter  class  of  problems.  The  algorithm  was  validated  by  simulations. 

2  Human-Swarm  Interactions 


The  emerging  field  of  Human-Swarm  Interactions  (HSI)  concerns  itself  with  the  problem  of  making 
large  teams  of  autonomous  agents  amenable  to  human  control.  The  rationale  for  this  is  the  need  to 
invert  the  current  many-to-one  relationship  where  multiple  operators  are  required  to  control  a  single 
autonomous  vehicle,  such  as  an  unmanned  aerial  vehicle.  As  many  facets  of  society  move  towards 
greater  levels  of  automation,  this  traditional  mode  of  interaction  is  not  sustainable.  This  becomes 
even  more  clear  in  the  various  branches  of  the  military  in  general,  and  the  air  force  in  particular, 
where  the  trend  clearly  is  pointing  towards  greater  usage  of  autonomous  vehicles  coexisting  with 
humans. 

Not  surprisingly,  the  issue  of  human  control  of  unmanned  vehicle  fleets  has  received  considerable 
attention  during  the  last  decade,  mainly  focusing  on  particular  types  of  scenarios  and  on  how  the 
information  flow  should  be  structured  so  that  the  human  operator  is  not  overloaded  yet  retains 
sufficient  situational  awareness.  What  this  project  focused  on  was  a  fundamental  treatment  of 
the  human-swarm  interaction  problem  -  not  in  terms  of  particular  interfaces  and  algorithms,  but 
in  terms  of  system  theoretic  properties  that  human-swarm  interaction  structures  should  exhibit 
for  the  higher-level  questions  (interfaces,  et.c.)  to  be  meaningful  beyond  anecdotal  and  particular 
scenarios. 
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2.1  Haptic  Manipulability 


When  controlling  a  team  of  robots  via  external  inputs,  it  is  desirable  to  know  how  the  inputs  are 
affecting  the  team’s  behavior.  In  some  situations,  it  is  desirable  to  have  all  of  the  agents  moving 
as  one  collective  unit,  with  each  robot  having  the  same  velocity.  Instead  of  broadcasting  the  signal 
to  all  robots,  we  control  the  movement  of  one  robot  directly  and  use  local  interaction  laws  to 
control  the  motion  of  the  followers,  as  described  in  the  previous  section  as  leader-  follower  control. 
Manipulability  describes  how  effectively  the  follower  agents  are  being  controlled  by  the  leaders  at 
any  point  in  time.  As  opposed  to  controllability,  which  is  a  property  that  describes  between  which 
states  the  system  can  be  in,  manipulability  is  an  instantaneous  notion,  and  thus  fits  our  needs  for 
haptic  feedback. 

In  order  to  define  manipulability  explicitly,  we  first  need  to  introduce  some  notation.  Consider 
a  swarm  of  N  mobile  robots,  consisting  of  Nf  followers  and  IV)  leaders,  where  Nf  +  Ni  =  N.  At 
time  t.  robot  V s  position  is  given  by  Xi(t)  £  Md,  i  =  1, . . . ,  N,  where  d  is  the  spatial  dimension  of 
the  network,  e.g.,  d  =  2  corresponds  to  the  case  of  planar  robots,  d  =  3  represents  agents  that  move 
in  a  three  dimensional  space,  and  so  forth.  The  positions  are  aggregated  together  to  give  the  overall 
position  of  the  robot  team  at  time  t,  x(t)  =  [x\ ( t ), ...,  xJj(t)]T  £  WNd.  We  assume  that  the  indexing 
of  the  agents  is  such  that  the  first  Nf  agents  are  the  followers,  and  the  last  Ni  agents  are  the  leaders. 
Under  this  indexing  scheme,  x(t)  =  [xj(t),  xf  (t)]T ,  where  xj(t)  =  [x^  (t),  ...,xjj  (t)]'1  £  M.Nfd  and 

*i(t)  =  [xNf+l(t),-,xN(t)]T  G  RNld- 

Now  we  can  introduce  the  formal  definition  of  manipulability,  which  is  formulated  as  the  ratio 
between  the  leaders’  and  the  followers’  velocities,  i.e. , 


M  = 


(i) 


For  example,  if  you  have  two  multi-agent  networks  with  the  same  velocity  applied  to  the  leader 
in  each,  the  larger  the  magnitude  of  the  followers’  velocities,  the  higher  the  manipulability  index. 
This  is  illustrated  in  Figure  [IJ  where  the  robot  configuration  in  1(a)  has  a  lower  manipulability 


than  that  in |l(b)[  Here,  since  the  leaders  in  each  network  have  the  same  velocity,  the  difference  in 
the  follower  velocities  is  due  to  a  difference  in  interaction  topology  between  the  two  networks.  If 
the  two  networks  had  the  same  interaction  topology,  differences  in  manipulability  would  be  caused 
by  different  leader  velocities. 

Since  we  want  to  use  this  manipulability  index  for  haptic  feedback,  it  is  important  to  understand 
what  all  it  depends  on.  It  is  clearly  a  function  of  xi,  because  that  is  the  control  input  directly 
specified  by  the  user.  It  also  depends  on  where  each  of  the  robots  are  in  space,  x,  as  well  as  the 
structure  of  the  multi-agent  network,  i.e.,  which  robot  pairs  would  like  to  maintain  inter-robot 
distances.  In  this  leader-follower  network,  the  control  law  of  the  followers  is  designed  so  that 
adjacent  agents  maintain  desired  distances  between  each  other.  If  we  let  V  =  1, ...,  N  denote  the 
set  of  agents,  then  we  can  define  the  unordered  set  E  C  V  x  V  to  contain  the  robot  pairs  that 
are  adjacent  in  the  underlying  network.  By  combining  the  vertex  set  V  with  the  edge  set  E,  we 
form  the  undirected  graph  G  =  (V,E),  which  defines  the  information  exchange  network  of  the 


multi-agent  team  of  robots.  The  manipulability  index  given  in  Equation  (27)  also  depends  on  this 
graph. 


We  know  that  M  in  Equation  (27)  depends  on  xi,  x,  and  G,  but  in  order  to  compute  it,  we  also 
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’o  (3  ^0  6 


(a)  Lower  Manipulabil- 
ity 


(b)  Higher  Manipulabil- 
ity 


Figure  1:  Manipulability  comparison  between  two  leader- follower  multi-robot  networks  (Ni  =  1). 
The  filled  circle  in  each  network  is  the  leader  and  the  arrows  represent  the  agents’  velocities.  Here, 


the  network  on  the  left  has  a  lower  manipulability  than  the  network  on  the  right  due  to  the  follower 
velocities  being  smaller  in  magnitude.  In  this  case,  the  difference  in  the  follower  velocities  between 


the  two  networks  and  thus  the  difference  in  manipulability  would  be  caused  by  a  difference  in 
interaction  topology  (not  shown). 


need  to  know  xj.  However,  the  followers’  velocities  depend  on  the  choice  of  interaction  dynamics. 


This  can  be  remedied  by  developing  an  approximate  manipulability  measure  that  does  not  depend 
on  the  interaction  dynamics,  i.e., 


The  approximate  manipulability  measure  was  derived  by  using  the  rigid-link  approximation. 
The  rigid-link  approximation  assumes  that  the  desired  distances  {dij}^v.  „.)g£  between  connected 
agents  are  perfectly  maintained  by  the  followers  at  all  times.  In  other  words,  ||xj(i)  —  Xj(t)\\  = 
dij,V(vi,Vj )  G  E,t  >  0.  Under  the  rigid  link  approximation,  the  distances  between  connected 
agents  do  not  change  over  time.  Given  smooth  and  differentiable  trajectories  of  ),  this  means 


that 


Xi(t)  -  Xj(t)  ||2  =  0,  V(vi,Vj)  G  E,  t>  0, 


which  expands  to 


(xi  -  Xj)T(xi  -  ij)  =  0,  V(vj,  Vj)  G  E, 


(2) 


where,  for  the  sake  of  notational  simplicity,  we  have  dropped  the  dependence  on  t. 


Using  Q,  the  rigid-link  approximation  condition  can  be  written  in  matrix  form  as 


R(x)x  =  0, 


where  R{x)  G  RlElxJVd  is  the  so-called  rigidity  matrix  of  the  system,  and  where  | E\  is  the  cardinality 
of  the  edge  set.  Or,  if  we  split  this  into  the  parts  contributed  by  the  leaders  and  the  followers, 


R(x,G)  Xf  =\Rf(x,G)\Ri(x,G)}  Xf  =0 
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where  Rf  E  RlElxAr/rf  and  Ri  E  RlElxArirf.  This  allows  the  follower  velocities  to  be  directly  expressed 
as  a  function  of  the  leader  velocities,  as 

if  =  -R\(x,  G)Ri(x,  G)xi,  (3) 


where  is  the  Moore-Penrose  pseudoinverse  of  Rf. 


The  approximate  manipulability  measure  was  derived  using  this  relation  in  Equation  ([3])  and 
is  given  by 

xf  JT(x,  G)J(x,  G)xi 


M(x,±i,G )  = 


R 


(4) 


where  J(x,G )  =  —R^(x,G)Ri(x,G).  This  expression  for  approximate  manipulability  can  now  be 
used  to  provide  a  human  operator  with  haptic  feedback  about  the  instantaneous  effectiveness  of 
his  or  her  control  inputs. 


This  approximate  manipulability  measure  fits  all  of  the  criteria  for  a  well-suited  haptic  force 
discussed  in  the  previous  section.  First  of  all,  when  N[  =  1,  it  can  be  shown  that 


0  <  M  <  Nf. 

Because  M  is  the  ratio  of  squared  norms,  it  is  clear  that  M  >  0.  As  for  the  other  side  of  the 
inequality,  when  Ni  =  1,  Ri  can  be  expressed  in  terms  of  Rf  as 


Ri  -  -Rflf, 

where  If  =  1  Id,  where  ljvy  is  an  Afy-dimensional  column  vector  with  Is  in  all  of  its  entries,  (g> 
denotes  the  Kronecker  product,  and  I ^  denotes  the  d  x  d  identity  matrix.  By  substituting  this  Ri 
into  Equation  ([3]),  we  get 


Xf  =  -R^Rixi  =  RjRf(l Nf  <8>  Id)h  =  RJfRf(lNf  <S>  x{). 

Since  R^Rf  is  a  projection  matrix,  we  get  ||x/||2  <  Arj||x;||2.  Thus,  the  desired  result, 


—  pt 


—  pt 


M  = 


\\*f\\ 


\xi\ 


2<Nf > 


follows. 

This  means  that  approximate  manipulability  has  a  known  minimum  and  maximum  when  there 
is  one  leader  in  the  group.  Since  the  case  we  are  interested  in  is  a  single  human  operator  controlling 
a  single  leader  via  a  haptic  device,  this  assumption  is  fine. 

Manipulability  describes  how  effectively  the  followers  are  being  controlled  by  the  leader,  and 
a  higher  value  of  manipulability  is  better,  so  we  want  to  use  a  haptic  force  that  encourages  the 
human  operator  to  move  the  network  of  agents  in  directions  that  produce  a  high  manipulability 
value.  The  question  that  remains  is:  how  do  we  create  a  mapping  from  manipulability  to  haptic 
force?  One  option  is  to  apply  the  force  feedback  in  the  direction  opposite  the  one  that  the  user 
is  trying  to  move  the  leader  in.  In  this  case,  the  force  would  impede  the  motion  of  the  user,  so 
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Figure  2:  Manipulability  vs  haptic  force  mappings  used  in  user  experiments. 


we  map  high  manipulability  to  low  force  feedback  and  low  manipulability  to  high  force  feedback. 
This  way,  if  the  user  is  moving  the  leader  in  a  direction  that  produces  a  high  manipulability  of  the 
swarm,  little  feedback  force  would  resist  the  user’s  motion.  Alternatively,  if  the  user  is  moving  the 
leader  in  a  direction  that  produces  low  manipulability,  the  resistive  force  from  the  haptic  device 
would  encourage  the  user  to  choose  a  different  direction  in  which  to  move  the  leader. 

Using  this  type  of  force  feedback,  the  approximate  manipulability  of  the  swarm  and  the  haptic 
feedback  are  inversely  related.  There  are  many  ways  we  can  form  an  equation  for  the  mapping. 
It  seems  reasonable  that  the  maximum  manipulability  value  should  map  to  zero  haptic  feedback 
force,  so  the  user  does  not  experience  any  resistance  when  moving  the  leader  in  this  desirable 
direction.  Similarly,  the  minimum  value  of  the  approximate  manipulability  measure  should  map 
to  the  maximum  force  of  the  haptic  device.  Thus  we  need  to  map  the  range  [0,  Nf]  to  the  range 
[0,  H],  where  H  is  the  maximum  applicable  force  of  the  haptic  device  being  used. 

There  are  many  ways  that  this  mapping  can  be  done  and  the  simplest  such  way  is  through  a 
linear  mapping,  given  by 

F,(M)  =  H  (l  -  ,  (5) 

where  Fj  is  the  magnitude  of  the  haptic  force  meant  to  resist  the  user’s  motion,  M  is  the  approximate 
manipulability  of  the  team,  and  Nf  is  the  number  of  followers  in  the  swarm. 

This  linear  map  does  not  encourage  high  values  of  manipulability  in  a  particularly  powerful 
way.  Another  option  is  an  inverse  exponential  map,  given  by 

p—aM  _  „—aNf 

Fe(M)  =  H  x_e_aNf  .  (6) 

Here,  a  is  a  parameter  that  can  be  changed  to  adjust  the  rate  of  change  of  the  force  as  a  function  of 
manipulability.  Figure  [2]  contains  a  plot  showing  the  linear  mapping  and  the  exponential  mapping 
for  a  values  of  -2,  -0.5,  0.5,  and  2.  The  plot  was  made  using  Nf  =  4  and  H  =  1.  It  can  be  seen 
that  as  a  becomes  more  negative,  the  resistive  feedback  force  remains  high  for  a  larger  range  of 
manipulability  values,  indicating  that  the  mapping  should  be  more  forceful  in  encouraging  high 
values  of  manipulability. 

Now  that  we  know  that  approximate  manipulability  is  an  appropriate  team-level  property  for  use 
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Figure  3:  Initial  setup  of  the  robot  team,  where  the  leader  agent  is  the  right-most  robot.  The  two 
target  locations  are  circled. 


Figure  4:  Depicted  is  a  student  utilizing  haptic  device  while  looking  at  the  virtual  environment 
(middle  screen). 


with  haptic  feedback,  we  will  describe  the  experiments  used  as  proof  of  concept  for  this  method.  In 
fact,  we  implemented  haptic  manipulability  feedback  as  described  previously  using  a  haptic  device 
known  as  the  PHANTOM  Omni  by  SensAble  Technologies.  The  force  feedback  was  rendered 
and  applied  to  the  device  while  a  user  controlled  a  swarm  of  robots  under  the  leader-follower 
configuration.  The  user  controlled  the  velocity  of  the  single  leader  robot  with  the  haptic  device  at 
the  same  time  that  the  manipulability  forces  were  being  fed  back  through  the  device. 

A  network  of  five  mobile  agents  was  created  using  Khepera  III  differential  drive  robots.  The 
user  was  instructed  to  move  the  group  of  robots  to  one  target  location  and  then  to  the  other,  in 
either  order,  using  the  haptic  device  to  control  the  leader  robot.  Each  target  location  was  marked 
with  an  ’X’  on  the  floor.  The  initial  setup  of  the  robots,  along  with  the  target  locations,  can  be 
seen  in  Figure  [3}  In  addition  to  the  physical  robots,  a  virtual  environment  was  provided  to  the 
user  to  show  the  robots’  locations,  along  with  the  target  locations.  Completion  was  defined  by  the 
leader  being  on  top  of  the  target  locations.  Figure  [4]  shows  one  of  the  students  using  the  haptic 
device  to  control  the  swarm  while  looking  at  the  virtual  environment. 

The  network  configuration  chosen  for  this  team  of  robots  was  a  triangle  formation  with  two 
extra  robots  on  the  ends,  as  shown  in  Figure  [5}  The  lines  in  the  diagram  represent  the  edges,  or 
the  links  that  represent  which  agents  are  maintaining  distances  with  each  other.  Here,  if  the  agents 
are  numbered  1  through  5  from  left  to  right,  agent  1  is  only  connected  to  agent  2  and  agent  5  is 
only  connected  to  agent  4.  Therefore,  the  network  is  not  rigid,  meaning  it  does  not  necessarily  stay 
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in  the  same  shape  as  shown  in  Figure  [5]  while  it  is  being  moved  by  the  user. 


Figure  5:  Initial  configuration  of  multi-robot  team,  showing  the  connections  between  agents.  The 
filled  circle  represents  the  leader. 

The  haptic  force  magnitude  given  by  Fe  and  Fj  in  the  previous  section,  after  being  computed 
using  the  instantaneous  approximate  manipulability  of  the  system,  must  be  converted  into  a  value 
that  can  be  applied  to  the  haptic  device.  Only  the  x  and  z  degrees  of  freedom  of  the  haptic  device 
were  used  for  this  experiment.  When  the  subject  used  the  device  to  control  the  velocity  the  leader, 
the  x  and  z  positions  are  captured  by  the  program  and  converted  into  a  velocity  by  computing  the 
norm  and  the  direction  angle.  Let  xpos  and  zpos  be  the  positions  obtained  from  the  haptic  device. 
The  magnitude  and  angle  are  computed  as  in  Equations  0  and  ([8]). 


mag  =  y J  z%os  +  x2pos 

6  =  arctan(TIHA) 


(7) 

(8) 


After  the  magnitude  of  the  haptic  feedback  force  is  computed,  which  is  a  value  between  0  and 
H ,  it  is  used  to  scale  the  input  velocity  given  by  the  user.  Here  we  take  H  to  be  1.  This  is  slightly 
different  than  the  forces  mentioned  in  the  previous  section.  This  is  done  so  that  the  magnitude  of 
the  feedback  force  generated  by  the  haptic  device  can  be  at  most  the  magnitude  of  the  force  exerted 
by  the  user  onto  the  device.  In  other  words,  the  haptic  feedback  force  will  only  be  as  strong  as  the 
force  exerted  by  the  user  onto  the  device  while  he  or  she  is  controlling  the  motion  of  the  leader.  In 
practice,  this  allows  the  force  feedback  to  come  on  gradually  and  helps  keep  the  haptic  device  from 
becoming  unstable. 

In  addition,  both  the  x  and  z  components  are  negated  so  that  the  output  force  is  in  the 
opposite  direction  of  the  input  velocity.  Thus,  as  described  previously,  the  force  that  the  haptic 
device  generates  is  a  repulsive  force  that  is  intended  to  impede  the  motion  of  the  haptic  device  in 
certain  directions.  Below  are  the  equations  used  for  the  force  rendering,  where  F  represents  the 
function  that  maps  the  approximate  manipulability,  M ,  of  the  swarm  to  the  haptic  force  (either  Fj 
or  Fe  from  the  previous  section),  and  mag  and  6  come  from  Equations  ([T])  and  ([8]),  respectively. 
The  x  and  z  haptic  forces  that  are  published  to  the  PHANTOM  Omni  after  being  computed  are 
given  below  as  Fx  and  FZl  while  the  y  component  is  always  set  to  zero. 


9 

DISTRIBUTION  A:  Distribution  approved  for  public  release. 


Figure  6:  Depicted  is  a  team  of  Khepera  III  robots  nearing  a  target  location. 


sgn(xpos)  F(M)  mag  |cos(0)| 

(9) 

sgn(%>0s)  F{M)  mag  |sin(0)| 

(10) 

The  haptic  feedback  described  here  was  successfully  implemented  and  the  user  was  able  to 
move  the  swarm  of  robots  while  feeling  the  resistive  forces  of  the  haptic  device.  The  team  of  robots 
moving  during  one  of  the  experiments  can  be  seen  in  Figure  |6|  where  the  leader  is  the  robot  with 
the  white  styrofoam  object  on  top  of  it  and  the  ’X’  on  the  floor  marks  one  of  the  target  locations. 


2.2  A  Formal  Approach  to  Human-Swarm  Interactions 

Our  objective  is  to  determine  whether  it  is  possible  for  a  human  operator  to  use  a  particular  human- 
swarm  interaction  (HSI)  to  achieve  some  geometric  configuration  with  a  swarm  of  mobile  robots. 
To  establish  feasibility,  we  first  need  to  know  what  a  HSI  represents  in  terms  of  the  structure  it 
imposes  on  a  multi-robot  system,  and  what  it  means  for  a  human  operator  to  achieve  a  particular 
geometric  configuration  with  the  robotic  swarm. 

In  general,  we  consider  continuous-time,  time-invariant  systems  with  inputs,  which  represent 
robotic  swarms  that  can  be  externally  controlled  (or  interacted  with)  by  a  user.  The  dynamics  of 
such  multi-robot  systems  can  be  defined  as  x(t)  =  f  (x(t) ,  u(t)) ,  where  x{t)  E  X  is  the  state  of  the 
system  at  time  t  and  u{t)  E  U  is  the  input  to  the  system  at  time  t.  In  fact,  x(t)  will  represent 
the  stacked  vector  of  the  states  belonging  to  all  robots  at  time  t,  while  Xi(t)  will  refer  to  the  state 
of  robot  i  at  time  t.  For  example,  x(t)  will  typically  represent  the  position  or  pose  of  all  robots 
together  at  time  t. 

More  importantly,  the  differentiable  function  /  :  X  x  U  — >  TX,  where  TX  is  a  tangent  space, 
is  structured  according  to  the  network  topology  of  the  multi-robot  system.  The  network  topology 
is  given  by  a  graph  Q  =  (V,  E),  where  V  is  the  set  of  vertices  representing  the  agents,  and  E  is  the 
set  of  edges  representing  information  exchange  between  agents  via  communication  links  or  due  to 
sensor  footprints.  Specifically,  /  E  sparse^ [Q)  conveys  that  state  information  in  the  multi-robot 
system  can  only  flow  between  agents  that  are  linked  in  the  network  topology.  Consequently, 

/  E  sparsest/)  <=>  fj  £  N(i )  =»  -  -  =  0,  ,  (11) 
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where  N(i)  is  the  so-called  neighborhood  of  robot  i,  i.e.,  j  G  IV(i)  if  ( i,j )  G  E,i,j  G  V,  and 
JV(i)  =  JV(i)  U  {*}. 

By  picking  a  particular  HSI  structure,  we  are  being  specific  about  the  structure  of  U,  i.e.,  how 
the  user  can  interact  with  the  robotic  swarm.  Our  definition  is  as  follows: 

A  HSI  structure  is  a  map 

H-.XxV^U,  (12) 

where  V  is  some  set  of  admissible  inputs  to  make  the  corresponding  robotic  swarm  more  amenable 
to  human  control.  Additionally, 

f(x,H(x,v))  =  fH(x,v)  G  sparsex{G),  (13) 

which  means  that  the  dynamics  f  under  this  map  H  needs  to  observe  the  sparsity  structure  imposed 
by  the  network  topology. 

This  definition  of  a  HSI  structure  implies  that  the  control  input  to  the  system  is  really  a 
combination  of  state  feedback  and  a  restricted  set  of  inputs  from  the  user,  which  respects  the 
constraints  imposed  by  the  network  topology.  Consequently,  the  dynamics  of  a  multi-robot  system 
under  such  a  HSI  structure  are 


x(t)  =  f(x(t),u(t)) 

=  f(x(t),H(x(t),v(t))  (14) 

=  fH(x(t),v(t)). 

Therefore,  a  HSI  structures  is  a  very  specific  way  in  which  the  user  controls  the  multi-robot  system, 
i.e.,  interacts  with  the  robotic  swarm. 

For  example,  suppose  that  a  robotic  swarm  consists  of  n  mobile  robots  positioned  on  a  rail 
(xi(t)  G  R)  with  single-integrator  dynamics, 


Xi(t)  =  Ui(t ),  i  =  {1, . . .  ,n}, 

(15) 

where  the  control  input  for  the  first  n  —  1  robots  is 

Ui(t)  =  (Xj(t)  -  Xj(t)). 

(16) 

jeN(i ) 

N(i)  denotes  the  neighborhood  of  robot  i,  which  is  the  set  of  all  its  immediate  neighbors  in  the 
network  topology  derived  from  communication  links  or  sensor  footprints. 

The  control  input  for  the  n-th  robot  is 

un(t)  =  v(t),  v(t)  G  V,  (17) 
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which  corresponds  to  the  user  directly  controlling  the  position  of  the  n-th  robot.  This  HSI  structure 
is  commonly  referred  to  as  a  single-leader  network ,  because  the  user  interacts  with  the  swarm  of 
robots  by  guiding  a  “leader”  robot,  while  the  other  robots  follow  the  leader  and  each  other  according 


to  the  consensus  dynamics  in  (16). 


If  we  stack  all  Xj(f)’s  into  a  state  vector  x(t )  £  Rn  and  all  Uj(t)’s  into  an  input  vector  u(t)  £  R" 
then  the  ensemble  dynamics  of  our  example  system  are 


x(t )  =  u{t)  =  —Lfx{t)  +  lv(t), 


0 


£  Rn 


0 

1 


(18) 


where  Lf  is  a  sub-block  of  the  graph  Laplacian  L  with  all  entries  in  the  re-th  row  of  L  equal  to  zero. 
Consequently,  the  single-leader  network  HSI  structure  is  a  particular  structuring  of  the  control 
input  u{t )  in  (18)  given  by  the  function  H ,  such  that 

u(t)  =  H(x(t),v(t))  =  — Lfx(t )  +  lv(t),  (19) 


where  v{t )  £  R  is  the  user  input. 

Finally,  when  a  multi-robot  system  under  some  HSI  structure  can  asymptotically  converge  to  a 
state,  a  subset  of  states,  or  all  states  in  a  specification  set  S  and  stay  in  this  set,  then 


limsup  d(x(t),  S)  =  0,  (20) 

t—¥  OO 


where, 


d(x(t),S)  =  inf  II x(t)  —  s||.  (21) 

s£S 

If  this  is  true,  then  we  say  that  the  user  can  achieve  some  or  all  of  the  geometric  configurations 
described  by  S  with  the  robotic  swarm. 


These  definitions  of  HSI  structures,  together  with  stability  guarantees  using  Control  Lyapunov 
Functions  and  measures  of  user  attention  and  effort,  were  investigated  on  a  number  of  different 
HSI  structures,  including  broadcast  signals,  leader-follower  control,  and  manipulation  of  geometric 
swarm  aggregates. 


2.3  Human-Swarm  Interactions  via  Coverage 

Coverage  control  is  one  area  of  multi-agent  control  that  has  received  significant  attention  lately 
and  it  is  concerned  with  how  to  position  agents  in  such  a  way  that  “surveillance”  of  a  domain 
of  interest  is  maximized.  In  this  context,  an  idea  that  has  been  widely  adopted  to  describe  how 
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interesting  a  “domain  of  interest”  is,  is  to  associate  a  density  function  to  the  domain. However,  the 
focus  of  previous  coverage  algorithms  has  largely  been  on  static  density  functions,  which  does  not 
provide  enough  flexibility  when  human  operators  are  to  adaptively  interact  with  a  team  through 
a  dynamic  re-shaping  of  the  density  functions.  This  research  is,  in  contrast,  able  to  handle  time- 
varying  density  functions  as  human  inputs  to  the  swarm,  and  Fig.  [7]  illustrates  how  the  proposed 
approaches  allow  a  user  to  manipulate  a  team  of  robots  to  drive  to  certain  areas  of  the  environment. 


The  Coverage  Problem 

We  are  looking  for  for  decentralized  update  laws  that  allow  a  human  operator  to  influence  the 
swarm  of  agents  by  providing  a  density  function  characterizing  the  areas  of  importance  within  the 
domain.  The  agents  in  the  robot  swarm  will  seek  to  optimize  their  coverage  over  the  domain  under 
the  influence  of  the  user-provided  density.  In  order  to  discuss  optimal  coverage,  one  first  has  to 
associate  a  cost  to  a  robot  configuration  that  describes  how  well  a  given  area  is  being  covered. 

Let  D  C  l2  be  the  two-dimensional  convex  domain  representing  the  robot  workspace.  Moreover, 
let  i \  D  X  [0,  oo)  — >  (0,  oo)  be  the  associated  density  function,  which  we  will  assume  is  bounded  and 
continuously  differentiable  in  both  arguments,  and  where  <j)(q ,  t)  captures  the  relative  importance 
of  a  point  q  £  D  at  time  t.  For  the  sake  of  Human-Swarm  Interactions  (HSI),  this  density  function 
will  be  an  user-provided  input  to  the  robot  swarm. 

The  coverage  problem  involves  placing  n  robots  in  D,  and  we  let  pi  €  D,  i  =  l,...,n  be 
the  position  of  the  ith  robot.  Moreover,  the  domain  itself  is  divided  into  regions  of  dominance, 


Figure  7:  A  team  of  five  Khepera  robots  are  influenced  by  a  human  operator  to  drive  to  certain 
parts  of  the  environment.  The  tablet  is  representative  of  the  robot  environment.  An  overhead 
projector  allows  us  to  overlay  the  broadcasted  density  on  the  environment  in  real-time. 
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P\, ...  ,Pn  (forming  a  proper  partition  of  D ),  where  the  idea  is  to  let  robot  i  be  in  charge  of 
covering  region  Pi.  One  can  then  ask  how  good  the  choice  of  p  and  P  is,  where  p  =  \jjJ  , . . .  ,Pn]T , 
and  P  =  {P\,  ■  ■  ■  ,Pn}-  The  final  piece  needed  to  answer  this  question  is  a  measure  of  how  well  a 
given  point  q  E  D  is  covered  by  robot  i  at  position  pi  e  D. 

We  consider  the  locational  cost  given  by 

n  r. 

H{p,P,t)  =  y2  \\q- Pi\\2Hq,t)dq.  (22) 

i=i Jp i 


At  a  given  time  t,  when  a  configuration  of  robots  p  together  with  the  partition  P  minimize, 
the  domain  is  said  to  be  optimally  covered  with  respect  to  < f>.  However,  it  is  possible  to  view  the 
minimization  problem  as  a  function  of  p  alone,  by  observing  that  given  p ,  the  choices  of  Pi  that 


minimize  (22)  is 


Vi(p)  =  {qe  D  |  \\q~Pi\\  <  \\q~Pj\\  ,i  /  j}- 


This  partition  of  I?  is  a  Voronoi  tessellation  —  hence  the  use  of  V)  to  denote  the  region.  With 
this  choice  of  region,  we  can  remove  the  partition  as  a  decision  variable  and  instead  focus  on  the 
locational  cost 

n  r 

1 24>(q,t)dq 


(23) 


IIj  n 

'H(jp,t)  =  V  /  Ik -p* 

JVi(p) 

where  4>{q,  t )  is  the  user-provided  input  to  the  system. 

It  can  be  shown  that 

=  [  ~2(q-  Pi)T4>(q^)dq,  (24) 

dpi  JVi 

and  since  cj>  >  0,  one  can  define  the  mass  mi  and  center  of  mass  c*  of  the  zth  Voronoi  cell,  V),  as 


mi(p,t)=  /  (f>{q,t)dq, 

JVi(p) 


Ci(p,t )  = 


;  (p) 

fVi(p )  riMdq 


mi 


(25) 


(26) 


For  convenience,  we  will  stack  the  masses  into  a  single  diagonal  (positive-definite)  matrix  denoted 
by 


r  Tn  i 


M  = 


m  2 


®Id, 


(27) 


where  “(8>”  is  the  Kronecker  product  and  Id  is  the  d  x  d  identity  matrix  where  d  corresponds  to  the 
dimension  of  the  domain  (d  =  2  is  considered  here).  We  also  group  the  centers  of  mass  as  a  single 


14 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


vector  c  =  [cf, . . .  ,  c^]  .  Using  these  quantities  allows  us  to  rewrite  the  partial  derivative  in  (24) 

(28) 


as 


=  2 mi{pi  -  Ci)T. 

dpi 


From  this  expression,  we  can  see  that  a  critical  point  of  (47)  is 

Pi{t)  =  Ci(p,t),  i  =  l,---,n, 


(29) 


and  a  minimizer  to  (47)  is  necessarily  of  this  form.  Moreover,  when  (29)  is  satisfied,  p  is  a  so-called 
centroidal  Voronoi  tessellation  (CYT). 


Centralized  Coverage  of  Time- Varying  Densities 


We  are  now  in  pursuit  of  a  control  laws  that  allow  us  achieve  a  CVT,  even  if  the  density  function 
is  time-varying.  In  order  to  move  forward  with  our  analysis,  we  need  to  understand  some  of  the 
properties  associated  with  the  center  of  mass  of  a  given  Voronoi  tessellation.  We  first  note  that 
the  partial  derivatives  of  the  center  of  mass  with  respect  to  the  agents  positions  has  eigenvalues 
bounded  by  one,  as  long  as  the  agents  are  close  enough  to  a  CVT,  and  under  the  assumption  that 
at  the  CVT  the  locational  cost  is  locally  strongly  convex. 

Lemma  1.  If  the  CVT  is  locally  strongly  convex,  then  there  exists  6  >  0  such  that  if 
|| p(t)  —  c(p(t),t) ||  <  6  then  Re(A)  <  1  for  all  A  €  eig(J^). 

Corollary  1.  As  long  as  the  agents  are  close  enough  to  a  CVT  configuration  for  which  the  locational 
cost  is  locally  strongly  convex,  the  matrix  inverse  (I  —  §^)-1  exists. 


We  use  this  corollary  to  obtain  our  first  result  on  maintaining  a  CVT  for  all  time. 
Theorem  1  (Maintaining  CVT).  Let  p  (to)  =  c  {p  (to) ,  to) .  If 


dc\  1  dc 


p  '  1  dp)  dt' 


t  >  to 


then 


\\p(t)-c(p(t),t)\\=0,  t>t0 


as  long  as  the  inverse  (I  —  dc/dp)  1  is  well-defined. 


Proof:  Assume  the  agents  begin  from  a  CVT  configuration,  i.e.,  p(to)  =  c(p(to) ,  to)-  We  need  to 
ensure  that 

Ti  (p  if)  ~  c(p(t)  ,t))  =  0,  Vt  >  tQ. 
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But  this  implies  that  p  =  c  =  p  +  || ,  which  can  be  rearranged  into  the  form 


as  long  as  the  inverse  is  well-defined.  From  Lemma  [lj  a  sufficient  condition  for  this  is  that  the 
initial  CVT  makes  the  locational  cost  locally  strongly  convex.  □ 

In  order  for  this  theorem  to  be  meaningful,  we  require  that  the  agents  first  achieve  a  CVT 
configuration  which  is  then  maintained.  This  control  law  also  assumes  the  agents  will  be  able  to 
instantaneously  accelerate  to  the  required  velocities,  which  may  not  be  possible  in  practice.  In 
order  compensate  for  saturation,  modeling  errors  and  deviations  from  the  CVT,  a  proportional 
term  can  be  introduced  that  forces  the  agents  into  a  CVT.  This  update  law  is  called  the  TVD-C 
which  stands  for  time-varying  densities,  centralized  case. 

p(t )  =  (j~  (p(*)  “  c(pW> *))  +  (30) 

where  k  >  0  is  a  proportional  gain.  It  is  noteworthy  that  this  proportional  term  influences  the  team 
of  robots  to  move  towards  a  (scaled)  gradient  descent  direction  to  achieve  a  CVT  configuration, 
and  that  once  a  CVT  is  achieved  the  proportional  term  does  not  contribute  to  the  update  law  and 
Theorem  Q]  holds. 

As  long  as  the  inverse  is  well-defined,  this  update  law  will  drive  the  agents  into  a  CVT.  This  is 
encapsulated  in  the  following  theorem. 

Theorem  2  (Convergence  of  TVD-C).  If  1  ^  eig  and  we  let 

(-,t(p-c)  +  !)'  ‘-t0 

then  || p(t)  —  c(p(t),t) ||  ->  0  as  t  — >  oo  with  rate  of  decrease  exp  (— n(t  —  to)). 


Proof: 

Consider  the  total  derivative  for  || p(t)  —  c(p(t),t) ||2  below 
^  (IIP  “  CH2)  =  2  (p  -  c)T  (p  -  c)  =  2  (p  -  cf 
=  2  (p  -  c)T  ( 

=  2  (p-c)T( 

=  —2k  \\p  —  c|| 
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which  tells  us  that 


f  (lip  -  c||2)  =  —2k  ||p  —  c||- 

This  is  a  homogeneous  linear  differential  equation  which  for  initial  condition  ||p(to)  —  c(p(to),to)|| 
has  solution  given  by 

]| P(t)  -  c{p(t),t) ||2  =  exp(-2 K(t  -  t0))  \\p(to)  -  c{p(t0),t0)\\2 
=>  \\p(t) -c(p(t),t)\\  =  exp(-/c(t  -  t0))  ||p(t0)  -  c(p(t0),t0)\\  • 

Hence  || p(t)  —  c(p(t),t) ||  — >  0  as  t  — >  oo.  Further,  we  see  that 


lb(t)  -c(p(t),t)|| 
II P(t)  ~  c(p(t),t) || 
lb(to)  -  c(p(to),to)|| 


exp (~K(t  -  t0))  |b(to)  -  c(p(t0),to)|| 
exp(-«(t  - 10)) 


such  that  the  rate  of  decrease  is  given  by  exp  (— n{t  —  to)). 


□ 


Distributed  Coverage  of  Time- Varying  Densities 


Recall  that  by  combining  equations  ( 25 )  and  rt26^ ,  we  have  that 


_  /vj(p)g^(g»*)  dg 


which  depends  on  p  in  the  boundary  of  the  area  over  which  the  two  integrals  are  taken. 


In  order  to  compute  these  partials,  we  first  need  to  make  use  of  Leibniz  rule. 


Lemma  2.  Let  Ll(p)  be  a  region  that  is  a  smooth  function  of  p  such  that  the  unit  outward  normal 
vector  n  is  uniquely  defined  almost  everywhere  on  dLl,  which  is  the  boundary  of  fL  Let 


Then 


where 


dq 

dp 


is  the  derivative  of  the  points  on  dLl  with  respect  to  p. 


17 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


Vj), 


It  can  now  be  shown  that  for  any  point  q  G  dVij  (the  boundary  between  adjacent  cells  Vt  and 


dp 


■  ( Pj  ~  Pi)  =  \eb  •  ( pj  -  pi)  —  eb  ■  fq~  Pl  +  Pj 


j 

■  (Pj  ~  Pi)  =  \eb  •  ( pj  —  Pi)  +  eb  ■  [  q  — 


dp i 


2 

Pi  +  Pj 


where  p^  denotes  the  6th  component  of  the  vector  pj  and  eb  is  the  6th  elementary  unit  vector.  Note 
that  in  this  chapter,  6=1,2  since  we  are  considering  the  case  Del2  only. 

Substituting  this  into  Leibniz  rule,  we  obtain 


dc. 


(a) 


dp 


(b) 


r  a -  v(b) 

IdV.i  \\Pj-Pi\\ 


dq  /  rrii 


+  (  [  a)dq )  (  [ 

\JVi(P)  J  \JdVij  \\Pj-Pi\\ 


dq  /  rrif  (31) 


where  a  =  1,2,  6=1,2  and  where  i  ^  j.  When  i  =  j  we  must  consider  the  contribution  from  all 
neighbors 


dc) 


(a) 


E 


BpP  teNv.  L 


7 

JdVik  \\Pk-Pi\\  J  ' 


[  ^dq)  ( [  J  U 

/vi(P)  \Jdvik  \\Pk-Pi\\  Jf 


(32) 


We  can  rewrite  these  equations  more  compactly  using  block  matrix  notation 

dci  fdVii  i.Q-Ci)(q-Pj)T<t>dq 


dc 

dp 


dc 

dp 


j  dPj 

_  dcj  _ 

dpi 

th  *th 


E 

keNv. 


m\\pj-Pi\\ 

f9Vik  ( q-Ci)(q-pi)T(j)dq 
m\\Pj-Pk\\ 


where  [-\ij  corresponds  to  the  i,  jth  d  x  d  block  matrix. 


(33) 

(34) 


It  is  noteworthy  that  given  a  continuously  differentiable  density  function  (j),  computing  dcj  dp 
at  any  given  time  t  becomes  an  exercise  in  finding  line  and  area  integrals.  In  implementation,  it 


18 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


suffices  to  use  numerical  approximations  to  compute  these  integrals  (e.g.,  Riemann  sums,  Gaussian 
quadrature  rule). 


One  more  partial  derivative  is 
of  Leibniz  rule  results  in 


required  for  update  law  (30),  namely  dc/dt.  Another  application 


da  _  mi  fVi  (9’ t)  d q  -  fv.  q<f>(q,  t )  d q  jy(p)  $(q,  t )  d q 
dt  m 2 


or  more  compactly 


With  | T 


aciT  dcnT 

dt  ’  •  ' '  >  dt 


da  = 

dt  rrii 


(35) 


(36) 


When  implementing  the  update  law,  it  suffices  to  numerically  compute  the  integrals  in  dc/dt. 
However,  unlike  with  the  computation  of  the  integrals  in  dc/dp ,  we  require  knowledge  of  d<f/dt. 
If  cj)  is  not  provided  analytically  in  t,  then  one  could: 


1.  Utilize  a  finite  difference  scheme  to  approximate  dcj)/dt.  This  could  however  give  rise  to 
difficulties  with  noisy  measurements. 

2.  Alternatively,  the  user  could  provide  the  time  evolution  of  the  density  function  directly  via 

a  continuous  function  d(j)/dt  such  that  (fcq,t)  =  J*  dr.  In  the  implementations,  the 

“shape”  of  the  density  is  defined  as  a  continuously  differentiable  function  cj)(q,  to)  defined  over 
D,  and  the  as  the  user  provided  input  corresponds  to  d<f/dt. 

As  a  final  implementation  note,  finding  the  integrals  over  V)  may  be  computationally  intensive. 
However,  the  expanded  versions  of  these  partial  derivatives  reveal  that  only  a  few  integrals  actually 
need  to  be  computed  per  agent.  To  compute  all  expressions  it  suffices  to  compute  the  integrals 
of  (j>,  d(j)/dt ,  qcj),  and  qd(j)/dt  over  Vi,  which  may  be  computed  once  for  each  agent  (and  may  be 
computed  in  a  distributed  fashion). 

The  second  subtle  difficulty  with  update  law  pO])  is  ensuring  that  the  inverse  (7  —  dc/dp )_1  is 
well  defined,  which  is  in  general  hard  to  do.  Lemma  [T]  provides  us  with  a  sufficient  condition  for 
the  existence  of  this  matrix  and  the  desired  approximation  can  be  found  by  using  the  Neumann 
series. 

Lemma  3  (Neumann  series).  Let  A  be  a  square  matrix.  If  liiri/^oc  Ak  =  0,  then  I  —  A  is  invertible 
and 


(7  -  A)'1  =  I  +  A  + A2  +  A3 +  .... 
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Moreover,  for  a  m  x  m  square  matrix  A,  lim^^  Ak  =  0  if  and  only  if  | A*  |  <  1  for  all  i  = 
1,  2,  •  •  •  ,  rri.  where  A*  are  the  eigenvalues  of  A.  As  such,  let  \max  denote  the  eigenvalue  with  the 
largest  magnitude  of  the  matrix  dc/dp.  Using  the  Neumann  series,  we  can  express  (/  —  dc/dp)^1 
as 


dc  A  1  dc  f  dc  A 2 

dp)  dp  V dp )  ~*~ 


(37) 


as  long  as  |Amax|  <  1. 

Our  goal  will  be  to  truncate  this  series  to  obtain  the  well-defined  approximation  to  the  matrix 
inverse,  but  then  the  question  arises:  how  many  terms  should  be  kept?  The  answer  lies  in  the 
sparsity  structure  of  dc/dp. 

Given  a  Voronoi  partition  of  the  area  of  interest,  we  denote  the  boundary  between  the  two  cells 
Vi  and  Vj  by  dVij.  Since  we  are  only  considering  the  planar  case,  there  are  three  possibilities  for 
dVif 


1.  dVij  is  empty,  meaning  that  cells  V)  and  Vj  do  not  intersect. 

2.  dVij  consist  of  a  single  point,  meaning  that  cells  Vi  and  Vj  share  a  single  vertex. 

3.  dVij  is  a  line,  meaning  that  cells  Vi  and  Vj  share  a  face. 


We  will  denote  Nyi  to  be  set  of  indexes  pertaining  to  the  agents  whose  Voronoi  cells  Vj  share  a  face 
with  agent  i’s  Voronoi  cell  V). 


Lemma  4.  j  ^  Ny. 


Proof:  For  the  first  two  cases,  i.e. ,  dVij  is  either  empty  or  consists  of  a  singleton,  from  (31 )  and  (32 ) 
we  see  that  the  integrals  over  dVij  would  be  zero.  Note  that  for  these  two  cases,  this  will  be  true 
for  all  four  elements  in  dci/dpj.  Since  these  two  cases  correspond  to  agents  i  and  j  not  sharing  a 
face,  we  conclude  that  j  £  Nyi  implies  that  dci/dpj  =  0. 

This  lemma  tells  us  that  dc/dp  actually  encodes  adjacency  information  of  the  graph  induced 
by  the  Voronoi  tessellation.  This  induced  graph  is  known  as  the  Delaunay  graph. 

To  obtain  a  distributed  update  law,  we  must  insist  that  the  update  for  pi  depends  only  on 
information  from  itself  (pi  and  <f(q,  t)  for  q  E  V.)  and  information  on  neighboring  agents  (pj  and 


f(q,  t )  for  q  €  Vj,  for  all  j  E  Nyt).  To  this  end,  we  truncate  the  Neumann  series  in  (37)  after  just  two 


entries,  i.e.,  (/  —  dc/dp )  1  ~  I  +  dc/dp.  By  modifying  update  law  (30)  with  this  approximation, 
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we  obtain  the  update  law 


^=(7+|)  ('K(p-c)  +  l)- 

which  at  the  individual  robot  level  results  in  the  update  law  called  TVD-D\  for  Time- Varying 
Densities,  Distributed  case  with  1-hop  adjacency  information: 

K  =  ^  -  Kfe  -  C)  +  E  (t|  -  K<PJ  -  <*>)  (38) 


where  Nyi  is  the  closed  neighborhood  set  to  Voronoi  cell  i  in  the  Delaunay  graph. 


It  should  be  noted  that  ( 38 )  is  always  well-defined  (as  long  as  4>  is  continuously  differentiable) . 
In  other  words,  even  if  the  Neumann  series  is  not  convergent  or  if  the  inverse  does  not  exist,  the 
entries  in  (|38|)  are  well-defined.  In  fact,  it  turns  out  that  during  the  robotic  experiment,  even  in 
cases  where  |Ama3;|  >  1,  the  robots  consistently  evolve  in  a  manner  that  achieves  coverage. 


One  can  now  investigate  what  happens  when  higher  order  terms  are  kept  in  the  Neumann 
series.  For  this,  we  let  dist(i,j )  denote  the  edge  distance,  or  number  of  edges  in  the  shortest  path, 
between  i  and  j  in  the  Delaunay  graph  induced  by  the  Voronoi  tessellation.  And,  as  dc/dp  is  a 
(block)  adjacency  matrix,  we  have  that 


(dc/dp)  /  0 


jij 


dist(i,j)  =  k ,  k  =  0,1,2,... 


where  [  •  }ij  denotes  the  block  corresponding  to  cell  c*  and  robot  position  pj. 


The  fc-hop  version  of  TVD-D\  thus  becomes 


(39) 


Robotic  Experiments 

These  controllers  for  human-swarm  interaction  where  implemented  on  robotic  platforms  to  validate 
the  approaches  to  HSI.  For  every  agent,  only  it  and  its  neighbors’  state  and  density  information 
in  their  respective  Voronoi  cells  were  used  to  compute  the  update  laws.  Line  integrals  were  ap¬ 
proximated  by  Riemann  sums,  whereas  Gaussian  quadrature  rule  was  used  for  area  integrals.  The 
human  operator  used  an  iPad  to  input  the  density  locations.  These  were  transmitted  over  WiFi 
and  UDP  to  a  central  computer  which  calculated  the  update  law  for  each  agent.  Even  though  a 
central  computer  is  used  for  the  computation,  the  control  is  computed  for  every  agent  only  using 
adjacency  information,  and  then  it  is  transmitted  to  the  pertinent  robot  via  WiFi  and  UDP. 
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The  density  design  was  implemented  on  an  Ubuntu  computer  with  an  Intel  dual  core  CPU 
2.13GHz  and  4GB  of  memory,  running  ROS  (Robot  Operating  System,  version  Diamondback) . 
This  computer  also  received  state  information  from  ten  OptiTrack  S250e  motion  capture  cameras 
that  were  used  to  provide  position  and  orientation  data  for  the  state  information  of  an  agent  and 
its  neighbors  was  used  to  compute  the  Voronoi  tessellation.  The  robotic  platforms  used  for  the 
experiments  were  Khepera  III  robots  from  K-team.  The  Khepera  III  robots  each  have  a  600MHz 
ARM  processor  with  128Mb  RAM,  embedded  Linux,  differential  drive  wheeled  robots  equipped 
with  a  wireless  card  for  communication  over  a  wireless  router.  The  rviz  package  in  ROS  was  used 
for  visualization  of  the  user-provided  density  function.  The  visualization  was  overlapped  with  the 
real  physical  environment  to  give  a  real-time  visual  representation  of  the  user-provided  density 
function.  This  is  shown  in  Fig.  [8j 


(a)  User  influencing  swarm  to  center  of  domain  (b)  User  influencing  swarm  to  corner  of  domain 


(c)  User  splits  the  swarm  by  introducing  multi-  (d)  User  observes  position  of  swarm  and  redi- 
ple  locations  of  interest  rects  it  with  the  use  of  a  tablet 


Figure  8:  Robotic  implementation  of  HSI  via  coverage  of  Gaussian  functions.  A  tablet  is  used  to  directly 
influence  the  location  of  the  swarm  formation.  An  overhead  projector  is  used  to  visualize  the  user-provided 
density  function  in  real-time  over  the  robot  workspace. 
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3  Pursuit-Evasion  Problems 


The  main  motivation  behind  this  work  has  been  our  desire  to  extend  differential  game  and  pursuit- 
evasion  problems  to  the  case  of  multiple  agents  (both  pursuers  and  evaders)  while,  at  the  same 
time,  take  into  consideration  the  effect  of  external  known  or  unknown  disturbances  (for  example, 
wind  fields  in  the  case  of  unmanned  aerial  vehicles  or  sea  currents  for  the  case  of  marine  vehicles). 
External  disturbances,  in  particular,  open  the  possibility  of  some  of  the  agents  (evaders  or  pursuers) 
using  additional  information  about  the  environment  in  order  to  achieve  their  objective  faster  or 
more  efficiently.  Such  concealment  or  deceptive  strategies  lead  to  games  of  asymmetric  information 
that  have  been  studied  extensively  in  economic  game  theory  but  not  so  much  in  the  context  of 
pursuit-evasion  games. 

The  overall  objective  is  thus  to  increase  the  ambiguity  in  terms  of  the  team  intentions  as  they 
proceed  to  complete  their  common  mission  objective.  As  a  particular  instance  of  the  proposed 
methodology  we  have  investigated  the  problem  of  group  pursuit  of  a  maneuvering  target  by  a  team 
of  unmanned  vehicles.  Since  it  is  not  be  assumed  a  priori  that  the  target  is  slower  than  its  pursuers 
(as  is  typically  the  case  for  the  majority  of  pursuit-evasion  games  of  this  type)  coordination  amongst 
the  pursuing  agents  is  necessary  to  ensure  capture.  The  problem  of  group  pursuit  of  a  maneuvering 
target  deals  with  the  characterization  of  the  individual  pursuit  strategies  for  each  agent  from  the 
group  of  pursuers  such  that  the  collective  objective  of  capturing  the  target  is  achieved.  Capture  of 
the  target  can  either  take  place  by  only  one  pursuer  (the  pursuer  that  will  reach  the  target  first),  or 
by  multiple  pursuers  simultaneously.  The  group  pursuit  problem  becomes  more  challenging  if  one 
seeks  for  pursuit  strategies  that  require  only  local  information  (distributed  group  pursuit  problem) 
and/or  when  the  target  is  moving  faster  than  the  pursuers. 

The  overarching  objective  of  this  research  is  therefore  to  study  motion  coordination  problems 
for  a  group  of  adversarial  agents,  whose  aim  is  to  accomplish  certain  collective  objectives  of  the 
group  while  maintaining,  at  all  times,  their  actions  concealed  from  an  adversarial  opponent. 

The  benefits  of  this  research  to  the  US  military  are  significant,  since  they  will  allow  the  deploy¬ 
ment  of  teams  of  autonomous  agents  (e.g.,  UAVs)  with  increased  decision-making  capabilities.  The 
results  from  this  research  can  increase  the  scope  and  types  of  missions  these  teams  of  autonomous 
agents  will  be  able  to  undertake.  The  collaborative  strategies  developed  under  the  proposed  re¬ 
search  plan  can  ensure  higher  mission  success  for  multi- agent  systems,  at  a  time  when  all  branches 
of  the  military  assign  an  ever  increasing,  more  active,  role  to  autonomous  systems  in  the  battlefield. 
Taking  advantage  of  the  external  disturbances  (as  opposed  to,  say,  try  to  cancel  or  nullify  them) 
can  yield  better  time  and/or  fuel-optimal  solutions,  while  the  possibility  of  asymmetric  knowledge 
of  the  external  disturbances  (along  with  the  dilemma  of  revealing  this  additional  knowledge  or  not 
to  the  opponent)  can  lead  to  deceptive  strategies  that  outperform  strategies  derived  based  on  the 
common,  full-information  assumption. 
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3.1  The  Suicidal  Pedestrian  Differential  Game 


In  this  research  topic  we  assumed  a  pursuit-evasion  game  where  the  pursuer  is  agile  but  faster 
than  the  evader,  but  the  evader  is  subject  to  maneuverability  constraints.  This  problem  is  the 
reverse  of  the  celebrated  homicidal  chauffeur  game  of  Isaacs.  Our  objective  was  to  find  the  optimal 
strategies  of  both  the  evader  and  the  pursuer  and  also  find  the  boundary  of  the  winning  set  of  the 
pursuer,  that  is,  all  initial  conditions  of  the  pursuer  that  will  lead  to  capture.  Both  point  capture 
and  proximity  capture  were  investigated. 

The  outcome  of  this  game  is  defined  by  the  game  parameters  (speed  ratio  and  maneuverability 
restrictions  on  the  evader)  and  the  initial  conditions  (initial  relative  distance  between  the  players 
and  the  evader’s  initial  velocity  vector  orientation).  The  objective  was  to  describe  the  relation 
between  this  set  of  parameters/initial  conditions  and  the  game  outcome. 


Problem  Formulation  and  Main  Results. 


The  equations  of  motion  for  the  pursuer  and  the  evader,  written  in  an  inertial  frame  of  reference 


with  coordinates  x  and  y  are  given  by 

xp  =  vp  cos  <j>p,  (40) 

ijp  =  vp  sin  <i)p,  (41) 

Xe  =  VeCOS(pe,  (42) 

Ve  =  ve  sin^e,  (43) 

<i>e  =  -V^u,  u  G  [—1, 1],  (44) 


To  proceed,  we  transform  the  problem  from  previous  fifth-dimensional  realistic  game  space  to 
a  two-dimensional  reduced  game  space,  by  fixing  the  origin  of  a  rotating  coordinate  frame  at  E’s 
current  position  and  by  aligning  the  y-axis  with  E’s  velocity  vector.  The  evader  action  then  consists 
of  choosing  her  center  of  curvature  at  a  point  C  =  (R/u,  0)  on  the  x-axis.  Consequently,  the  reduced 
game  space  has  only  two  coordinates,  namely  the  (x,  y)  coordinates  of  P  relative  to  E  in  the  evader’s 
fixed,  velocity-aligned  frame.  The  equations  of  motion  of  P  in  this  frame  are  given  by 


x  =  yu  -  vpsm<l>, 

'V 

y  =  -p.xu  -  ve  -  Vpcoscj),  it  e  [-1,1], 

li 


(45) 

(46) 


where  (j)  is  the  pursuer’s  relative  heading  in  this  new  reference  frame,  given  by  (j)  =  ir  —  (j)p  +  <j>e.  The 
game  terminates  when  capture  occurs,  that  is,  when  the  relative  distance  between  the  evader  and  the 
pursuer  becomes  less  than  i.  The  manifold  contained  within  the  game  space  which,  once  penetrated, 
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signals  the  game  termination,  is  called  the  terminal  surface.  The  terminal  surface  for  our  game  is  a 
circle  with  radius  £  centered  at  the  origin,  i.e.,  the  evader’s  position.  We  may  thus  formally  define 
the  terminal  surface  by  C  =  {x  6  M2  :  |x|  =  and  the  game  space  as  £  =  jx  £  M2  :  |x|  >  £}.  The 
terminal  surface  is  then  the  boundary  of  the  game  space. 

Retaining  the  concept  of  the  payoff  as  it  appears  in  the  theory  of  zero-sum  games,  we  assign 
the  values  +1  for  escape  (termination  does  not  occur)  and  —1  for  capture  (termination  occurs,  i.e., 
the  circle  is  penetrated  by  the  pursuer’s  trajectory).  An  additional  step  is  necessary  in  the  process 
of  constructing  the  barrier  of  the  game.  Namely,  we  need  to  distinguish  the  critical  case  in  which 
the  state  x  reaches  the  terminal  surface  but  does  not  cross  it,  ultimately  returning  back  into  the 
interior  of  S.  The  case  when  the  terminal  surface  is  reached  without  penetration,  is  referred  to  as 
neutral  outcome,  and  the  corresponding  payoff  is  assigned  zero  value.  Thus,  given  an  initial  state 
x(to)  at  t  =  to  and  the  pursuer  and  evader  control  histories,  </>(•)  and  u(-)  respectively,  the  payoff 
formally  reads  as 


J(x(t0),  </>(•),«(•)) 


+1,  for  no  termination  (escape), 
<  0,  neutral  outcome, 

—  1,  for  termination  (capture). 


(47) 


We  thus  seek  to  solve  the  problem  of  conflicting  actions  represented  by  u  (maximizing  control)  and 
<f>  (minimizing  control)  that  maximize/minimize  the  payoff  (47)  under  the  dynamic  equations  (45) 
and  (46).  Our  goal  is  to  obtain  an  analytic  expression  for  the  barrier  surface,  which  consists  of  all 
starting  points  that  lead  to  the  neutral  outcome. 


The  first  step  towards  a  solution  of  the  game  is  to  obtain  the  usable  part  of  the  terminal  surface 
C.  It  is  not  uncommon  for  a  terminal  surface  of  a  game  to  be  divided  into  two  regions:  the  Usable 
Part  (UP)  and  the  Nonuseable  Part ,  which  are  separated  by  what  is  known  in  the  literature  as  the 
Boundary  of  the  Usable  Part  (BUP).  The  usable  part  is  the  subset  of  the  terminal  surface  on  which 
the  pursuer  can  enforce  capture,  namely,  penetration  of  the  terminal  surface.  The  nonusable  part 
is  the  remaining  part  of  the  terminal  surface.  On  it  the  game  would  end  only  if  the  evader  does 
not  play  optimally. 


To  obtain  the  UP,  we  parameterize  the  terminal  surface  with  the  variable  s  as  shown  in  Figure 
H  by 

(x,  y)  =  (£  sins,  £  cos  s),  xfC,  (48) 

and,  omitting  several  details,  the  usable  part  of  the  terminal  surface  turns  out  to  satisfy 


cos  s  >  — - , 

ve 


,7T  , 

* e  (g’71-] 


(49) 


while  the  BUP  is  thus  determined  through 


t  P\ 
s  =  arccos( - ), 


-  ^  i 


(50) 
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Barrier 


y'  s  \ 

■  /  ^ 

[/  X 

Nonusable  Part 


Figure  9:  The  terminal  surface  C  of  the  game,  which  is  a  circle  of  radius  £.  It  is  divided  into  the  usable 
part  (the  dark  line),  and  the  nonusable  part,  separated  by  the  BUP.  The  BUP  connects  to  the  barrier,  which 
meets  the  terminal  surface  at  the  BUP  tangentially. 


An  illustration  of  the  usable  part,  the  BUP  and  the  nonusable  part  is  given  in  Figure  [9} 

Having  identified  the  BUP,  we  turn  our  attention  to  the  construction  of  the  barrier.  The  barrier 
is  a  semipermeable  surface,  that  is,  optimal  play  by  both  agents  starting  from  any  point  will  generate 
a  trajectory  that  does  not  penetrate  this  surface.  Without  getting  much  into  details,  we  enforce 
the  Isaacs’  equation  for  games  of  kind,  obtain  the  retrogressive  path  equations  and  after  applying 
the  suitable  boundary  conditions,  we  connect  the  barrier  tangentially  to  the  BUP.  The  analytic 
expression  of  the  barrier  curve  turns  out  to  be: 


x(t)  =  —R  +  Rcos(ct)  +  (£  +  vpr)  sin(s  —  cr),  (51) 

y(r)  =  Rsm(cr)  +  (£  +  vpt)cos(s  -  ct),  rG[0,rmax],  (52) 


where  r  is  the  retrograde  time  ( tf  —  t ),  and  s  is  given  by  (50).  Equations  (51)  and  (52)  separate  the 
game  space  into  two  regions;  a  region  in  which  optimal  play  of  the  pursuer  leads  to  capture  and  a 
region  in  which  optimal  play  of  the  evader  leads  to  evasion.  To  obtain  rmax,  it  is  important  to  note 
that  the  barrier  expression  is  invalidated  as  soon  as  two  barrier  branches  intersect  -  the  part  of  the 
barrier  arc  beyond  the  point  of  intersection  is  then  no  longer  valid  and  is  therefore  discarded.  In 
our  case,  the  two  branches  of  the  barrier  intersect  on  the  y-axis,  because  of  the  inherent  symmetry 
of  the  problem  at  hand.  Thus,  we  may  obtain  rmax  as  the  root  of  x(r)  =  0,  i.e.,  rmax  is  the  solution 
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of  the  transcendental  equation 


H-  ^p^max)  sin(s  CTmax)  —  R  R  cos(cTmax) . 


(53) 


Figure  10  depicts  the  barrier  for  ve  =  1,  vp  =  0.6,  R  =  0.7,  l 


0.5.  Notice  that  the  barrier  meets 


The  Barrier 


Figure  10:  The  barrier,  given  by  equations  (51)  and  (52),  for  ve  =  1,  vp  =  0.6,  R  =  0.7,  t  =  0.5.  Notice 
that  the  barrier  meets  the  terminal  surface  tangentially  at  the  BUP. 


the  terminal  surface  at  the  BUP  tangentially. 

Remark  1 .  The  cases  in  which  the  agents  have  the  same  speed,  or  point  capture  is  considered,  can 
be  obtained  by  directly  substituting  vp  =  ve  or  t  =  0  respectively. 


After  answering  the  question  of  which  states  leads  to  capture  as  opposed  to  evasion,  we  in¬ 
vestigated  the  time-optimal  problem  when  the  outcome  is  capture,  that  is,  we  considered  initial 
states  within  the  capture  zone  delineated  by  the  barrier  and  examine  the  characteristics  of  the 
time-optimal  capture  trajectories.  We  demonstrate  that,  for  the  special  case  of  point-capture,  the 
time  optimal  problem  is  equivalent  to  Zermelo’s  Navigation  Problem  in  optimal  control. 
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Point  Capture:  Time-Optimal  Trajectories  and  Connection  to  Zermelo’s  Navi¬ 
gation  Problem 


Zermelo’s  Navigation  Problem  (ZNP)  is  a  well-known  result  in  optimal  navigation,  which  has 
received  a  lot  of  attention  in  the  literature.  Initially  stated  by  the  German  mathematician  E. 
Zermelo  in  1931,  the  problem  formally  reads:  “In  a  given  vector  field  of  currents,  which  is  a  function 
of  position  (and  possibly  time),  a  vehicle  moves  with  constant  speed  relative  to  the  currents.  How 
should  the  vehicle  be  navigated  in  order  to  reach  a  given  destination  in  minimum  time?”.  In  ZNP, 
the  equations  of  motion  of  the  vehicle  are  given  by 

x  =  v  cos  0  +  U(x,  y),  (54) 

y  =  vsincf)  +  Q(x,y),  (55) 


where  U,  Q  are  known  functions  that  correspond  to  the  components  of  the  vector  field  along  the 
x  and  y  directions,  respectively,  and  f>  is  the  heading  angle  with  respect  to  the  x-axis  (the  control 
input).  The  goal  is  to  minimize  time  until  the  vehicle  reaches  a  target  location. 


Returning  to  the  original  differential  equations  ( 45 )-( 46 ) ,  it  is  easy  to  observe  that  since  E’s 


optimal  strategy  is  u  =  —  1  (for  initial  conditions  of  the  pursuer  in  the  right-half  plane),  equations 


(45)  and  (46)  assume  the  same  form,  with  a  change  in  the  angle  convention.  Thus,  E’s  optimal 


control  results  in  an  induced  vector  field  akin  to  a  current,  which  P  needs  to  overcome  in  order  to 
intercept  E  in  minimum  time.  This  vector  field  is  shown  in  Figure  m  This  fact  allows  us  to  use 
the  well-known  Zermelo ’s  Navigation  Formula  which  states  that  the  optimal  control  <f>*  obeys 


=  Sill 


,dQ(x,y ) 
dx 


+  sin  4>*  cos  cj>* 


dU(x,y )  dQ{x,y) 


dx 


dy 


—  cos 


,  dU  (x,  y) 
dy 


which,  for  U(x,y )  =  vey/R  and  Q(x,y)  =  —vex/R  —  ve,  yields 


Ve 

R ' 


(56) 


(57) 


The  problem  therefore  reduces  to  a  two-point  boundary  value  problem  consisting  of  integrating  the 


state  equations  along  with  (57),  subject  to  initial  conditions  (x,y)  and  cf>*( 0)  that  will  lead  to  a 


trajectory  passing  through  the  origin  (0,  0).  Alternatively,  one  can  consider  integrating  this  system 
of  ODEs  backwards  in  time,  i.e.,  by  flipping  the  sign  of  the  ODE’s  and  using  the  variable  r,  subject 
to  the  retrograde  boundary  conditions  (x,  y)  =  (0,  0)  and  a  variable  retrograde  boundary  condition 


4>*j  €  [0,  27 r]  for  (57).  This  will  yield  a  parametric  family  of  curves,  and  it  remains  to  locate  the  one 
that  passes  through  the  original  point  (x,y)  of  interest.  In  fact,  this  integration  can  be  performed 
analytically  to  obtain  the  following  parametric  family  of  curves 


x(0^;r)  =  —  R  +  Rcos(ct)  +  vpt  sin(^  —  cr), 
y(<t>}',r)  =  Rsin(cr)  +vptcos(4>)  ~  cr),  r  G  [0,rmax], 


(58) 

(59) 
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Vectorfield  induced  by  the  Evader  Strategy 


Figure  11:  Applying  the  evader’s  strategy  induces  a  vector  field  that  resembles  a  current,  which  P  needs  to 
overcome  in  order  to  intercept  E  in  minimum  time.  Plotted  for  vp  =  ve  =  1,  R  =  0.7. 


where  is  the  free  parameter  and  rmax  is  the  solution  to  the  transcendental  equation 

UpTrnax  sill(0y  CTmax)  —  R  R  COs(cTmax) . 


(60) 


Figure  [T2]  illustrates  several  time  optimal  trajectories,  members  of  the  parametric  family  of  curves 
given  by  (58)  and  (59),  corresponding  to  different  values  of  The  barrier,  i.e. ,  the  rightmost  time 
optimal  trajectory,  is  obtained  for  <j)*j  =  tv,  and  is  identical  to  the  barrier. 


Domain  Decomposition  in  the  Case  of  a  Faster  Pursuer 

In  this  section  we  investigate  the  case  when  the  pursuer  is  not  only  completely  agile,  but  is  also 
faster  than  the  evader  (that  is,  the  speed  ratio  a  =  vv/ve  >1).  As  already  stated,  this  case  leads 
to  global  capturability,  namely,  the  evader  is  captured  regardless  of  the  initial  conditions  of  the 
pursuer  and  the  evader.  Although  capture  is  guaranteed  on  the  entire  Euclidean  plane,  it  is  shown 
that  the  Euclidean  plane  can  be  partitioned  by  a  boundary  B  separating  two  different  types  of 
solutions  exhibited  in  the  game:  one  in  which  the  evader  is  captured  while  performing  a  hard  turn, 
and  another  one  in  which  the  evader,  having  completed  the  turn,  is  captured  during  a  straight-line 
dash  while  trying  to  avoid  the  pursuer  (also  referred  to  as  the  end  game).  The  region  inside  the 
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ZNP  Time  Optimal  Trajectories  that  lead  to  Capture 


Figure  12:  Members  of  the  parametric  family  of  curves  given  by  (58)  and  (59),  corresponding  to  different 
values  of  v  =  1,  R  =  0.7.  The  time  to  capture  is  also  shown  for  the  case  in  which  the  game  is  initialized 
as  a  head-on  engagement.  For  clarity,  only  the  case  u*  =  —1  is  shown.  The  curves  for  it*  =  1  are  symmetric 
with  respect  to  the  y-axis. 


boundary  B  corresponds  to  initial  conditions  that  lead  to  the  first  type  of  solution.  For  initial 
conditions  of  the  game  outside  of  the  boundary,  the  second  type  of  solution  manifests  itself.  The 
shape  of  the  boundary  B  resembles  in  appearance  the  one  of  the  barrier  for  the  case  a  <  1.  A 
geometric  procedure  to  obtain  the  boundary  B  in  the  realistic  plane  leads,  in  the  case  of  point 
capture,  to  a  boundary  defined  by 


(61) 
(62) 

The  boundary  in  this  case  is  depicted  in  Figure  [13] 


x(9)/R  =  cos  9  +  a9s\n9  —  1, 
y(9)/R  =  sin  9  —  a9  cos  9. 


Extension  to  the  Case  of  Multiple  UAVs 

We  turned  our  attention  to  the  case  in  which  the  evader  faces  several  pursuers.  To  this  end, 
consider  the  game  involving  one  evader  and  an  arbitrary  number  -  say,  N -  of  pursuers,  moving  on 
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Boundary  for  Point  Capture,  a=2 


(a) 


Boundary  for  Point  Capture,  a=1 


X/R  Amax 

(b) 


Figure  13:  The  boundary  B  in  the  case  of  point  capture,  for  (a)  a  =  2,  and  (b)  a  =  1.  For  a  >  1 
the  area  enclosed  by  these  curves  corresponds  to  initial  conditions  that  lead  to  capture  while  the  evader  is 
turning,  whereas  initial  conditions  outside  the  area  enclosed  by  the  curve  lead  to  capture  after  the  evader 
has  completed  her  turn  and  while  she  performs  a  straight-line  dash  away  from  the  evader  (end  game).  For 
a  =  1  this  curve  separates  the  regions  of  capture  (inside)  and  escape  (outside)  and  it  thus  coincides  with  the 
barrier. 


a  plane.  The  subscripts  pi  and  e  will  be  reserved  to  designate  the  “i-th  pursuer”  and  the  “evader,” 
respectively.  As  before,  the  pursuers’  objective  is  capture,  that  is,  interception  of  the  evader  in  finite 
time,  whereas  the  evader’s  objective  is  evasion,  a  state  in  which  she  avoids  interception  indefinitely. 
Here,  interception  is  to  be  understood  in  the  context  of  coincidence  of  the  Cartesian  coordinates 
between  the  evader  and  at  least  one  of  the  pursuers  (point-capture).  This  will  make  the  subsequent 
analysis  a  bit  easier,  but  the  results  obtained  can  be  modified  to  account  for  proximity  capture  as 
well.  For  this  scenario,  and  for  the  sake  of  simplicity,  it  will  be  assumed  that  all  agents  have  the 
same  constant  speed,  that  is,  ve  =  vPi  =  v  for  all  i  =  1, . . . ,  N.  As  before,  the  pursuers  are  assumed 
to  be  agile,  in  the  sense  that  they  can  change  the  direction  of  their  velocity  vector  instantaneously. 
On  the  other  hand,  the  evader  is  less  agile  and  cannot  make  turns  that  have  a  radius  smaller  than 
a  minimum  turning  radius  R.  The  dynamic  equations  consist  of  N  copies  of  the  suicidal  pedestrian 
dynamics,  i.e., 


xi  =  -vsmcpi, 

v  A 

Vi  =  YXiU  ~  v  cos  Vi  ~  vi 


i  =  l,...,N,  u<E  [-1, 1], 


(63) 

(64) 


where  ( Xi,\ji )  6  M2  are  the  coordinates  of  the  i-th  pursuer  in  the  reference  frame  fixed  on  the 
evader  position,  aligned  with  its  velocity  vector,  and  (pi  is  the  i-th  pursuer’s  control,  given  by 
<pi  =  7r  —  <pPi  +  i pe.  We  will  now  make  the  following  assumption: 
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Assumption  1.  For  all  i  =  1, . . . ,  N,  we  have  y*(0)  >  0. 


This  assumption  essentially  translates  into  all  the  pursuers  populating  at  t  =  0  only  the  upper 
half  plane;  in  other  words,  the  evader  faces  all  the  pursuers  at  the  beginning  of  the  game,  rather 
than  having  pursuers  at  her  back  (or,  alternatively,  that  all  UAV’s  at  her  back  are  sufficiently  far 
away  to  be  neglected).  Recall  that,  if  pursuers  are  located  both  in  front  of  the  evader  as  well  as  at 
her  back,  and  the  evader  is  in  the  interior  of  the  convex  hull  generated  by  the  pursuer  positions, 
then  a  completely  agile  evader  will  have  no  strategy  leading  to  evasion.  Since  the  set  of  evader 
strategies  in  our  problem  is  just  a  subset  of  the  set  of  strategies  of  a  completely  agile  evader, 
capture  is  guaranteed  in  our  case  as  well.  Furthermore,  it  is  reasonable  to  assume  that  once  a 
head-on  collision  has  been  avoided  by  the  evader  against  all  incoming  UAVs,  these  will  not  turn 
around  to  pursue  the  evader. 


Game  Termination 


We  will  now  establish  the  conditions  under  which  the  game  terminates.  Game  termination  occurs 
if  the  position  of  at  least  one  of  the  pursuers  becomes  identical  with  that  of  the  evader  (collision), 
or  if  all  pursuers  enter  the  lower  half  plane  (evasion).  The  latter  condition  guarantees  evasion 
because  no  velocity  vector  component  of  the  evader  points  towards  a  pursuer,  so  the  evader  is 
able  to  escape  without  changing  her  heading.  This  condition  for  evasion  is  equivalent  to  the  evader 
being  able  to  construct  a  hyperplane  separating  her  from  the  pursuers  and  orient  her  velocity  vector 
perpendicular  to  that  hyperplane,  as  depicted  in  Figure  [14)  In  our  game  setup,  since  the  evader 


velocity  vector  is  at  all  times  aligned  with  the  y-axis,  we  may  take  that  hyperplane  to  be  the  x-axis, 
arriving  at  the  same  conclusion. 


Extending  the  Barrier  of  the  Suicidal  Pedestrian  Game 

When  the  two  agents  have  equal  speed  capabilities,  the  expression  for  the  barrier  is 

x(t)  =  —  R  +  Rcos(cr)  +  vt  sin(cr),  (65) 

y{r)  =  R  sin (cr )  -  vtcos(ct),  t  £  [0,rmax],  (66) 

where  t  =  tf  —  t  is  the  retrograde  time  variable  starting  from  game  termination,  say  at  tf,  and 
c  =  v/R.  To  obtain  rmax,  recall  that  the  barrier  expression  in  the  two  player  game  is  invalidated  as 
soon  as  two  barrier  branches  intersect  -  the  part  of  the  barrier  arc  beyond  the  point  of  intersection 
is  then  no  longer  valid  and  is  therefore  discarded.  If  the  evader  is  able  to  terminate  the  game 
without  capture,  then  the  evader  is  able  to  induce  a  trajectory  that  leads  to  the  lower  half  plane 
in  the  evader  fixed  reference  frame.  This  is  equivalent  to  the  evader  being  able  to  successfully 
eliminate  her  velocity  vector  component  pointing  towards  the  pursuer  before  capture  occurs.  The 
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Figure  14:  Evasion  is  guaranteed  if  the  evader  is  able  to  construct  a  hyperplane  separating  him  from  the 
pursuers  and  apply  his  velocity  vector  perpendicular  to  that  hyperplane. 


states  for  which  the  evader  is  able  to  do  so,  as  opposed  to  the  states  she  is  not,  are  separated  by 
the  barrier. 


In  the  two-player  game,  the  extensions  of  the  barrier  surfaces  beyond  rmax,  which  corresponds 
to  intersection  on  the  y-axis,  are  of  no  particular  significance;  indeed,  changing  the  evader  strategy 
from  u  =  —  1  to  u  =  1  in  the  case  in  which  the  pursuer  is  positioned  in  the  left  quadrant  instead 
of  the  right  quadrant,  greatly  reduces  the  capture  area  (see  Figure  [l5|(a).  However,  in  the  case 
of  multiple  pursuers,  and  if  both  quadrants  are  populated  with  pursuers,  these  extensions  do  offer 
some  important  information.  In  Figure  [T5|(b) ,  consider  only  «Si,  the  barrier  if  the  evader  chooses 
u  =  —  1  to  respond  to  pursuer  1.  Then  if  pursuer  2  is  outside  the  capture  area  delineated  by  that 
barrier  branch,  the  evader  has  enough  time  to  eliminate  the  velocity  vector  component  pointing 
towards  her,  although  she  initially  increases  it  as  she  turns  towards  pursuer  2,  in  order  to  avoid 
pursuer  1.  By  choosing  u  =  —  1  or  u  =  +1,  the  evader  essentially  chooses  the  capture  surface  to 
be  either  the  one  provided  by  Si,  or  S2,  respectively.  We  therefore  retain  the  parts  of  Si  and  S2 
beyond  Tmax  and  thus  redefine  a  new  maximum  retrograde  time,  say  rm,  for  which  equations  (65) 
and  (|66[)  are  valid,  to  be  the  first  nonzero  time  in  which  the  barrier  curve  hits  the  :r-axis  (see  Figure 
[I5][b)).  Then,  rm  is  the  unique,  non-zero  solution  of 


i?sin(crm)  =  VTmcos(cTn 


T~m.  >  0. 


(67) 
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The  Barrier:  Equal  Speeds,  Point-Capture 


Figure  15:  (a)  The  barrier  for  the  case  of  one  evader  against  one  pursuer,  given  by  equations  (65)  and  (66) 
for  v  =  1  and  R  =  0.7.  (b)  The  barrier  surfaces  if  multiple  pursuers  are  present.  S i  separates  the  states 
that  lead  to  capture  from  those  that  lead  to  evasion  if  the  evader  chooses  u  =  —  1,  while  S 2  corresponds  to 
u  =  +1. 


The  ZNP  Solution  and  Time-to-Capture 


The  connection  between  the  Suicidal  Pedestrian  game  and  the  well-known  Zermelo’s  Navigation 
Problem  (ZNP)  of  optimal  control  has  been  highlighted  in  our  previous  work.  Noticing  that  the 
dynamics  exhibit  the  same  form  as  those  considered  in  ZNP,  one  may  apply  Zermelo’s  formula  to 
calculate  the  rate  of  the  optimal  control  of  the  pursuers: 

4>*  =  (68) 


for  any  fixed,  constant  value  u  of  the  evader  control,  that  is,  whenever  u  is  not  a  function  of 
time  or  state.  In  the  two-player  case,  the  evader’s  control  is  bang-bang  and  without  switching 
until  game  termination.  Thus,  for  u  =  1  or  u  =  —  1,  one  may  integrate  the  resulting  system 
of  ordinary  differential  equations,  subject  to  initial  conditions  (x,y)  and  </>*( 0)  that  will  lead  to 
a  trajectory  passing  through  the  origin  (0,0).  However,  it  is  more  convenient  to  perform  this 
integration  backwards  in  time,  by  switching  the  signs  of  the  ODE’s,  using  the  origin  as  boundary 
condition  and  a  variable  retrograde  boundary  condition  (f)*^  E  [0,  2n]  for  (68).  This  will  yield  a 
parametric  family  of  curves,  and  it  remains  to  locate  the  one  that  passes  through  the  original  point 
(x,y)  of  interest.  In  fact,  this  integration  can  be  performed  analytically  to  obtain  the  following 
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Time  to  Capture 


(a) 


(b) 


Figure  16:  (a)  Several  time  optimal  trajectories  leading  to  capture,  members  of  the  parametric  family  of 
curves  obtained  by  application  of  Zermelo’s  navigation  law.  Solid  line  trajectories  correspond  to  the  evader’s 
control  being  u  =  —  1  while  dashed  to  u  =  +1.  (b)  Time  to  Capture  (TtC),  obtained  by  storing  the  t- variable 
values  across  the  optimal  trajectories,  and  then  interpolating  these  values  over  a  selected  grid  of  choice. 


parametric  family  of  curves 


t )  =  —R  +  Rcos(ct )  +  UTsin(</>j  —  cr),  (69) 

t  )  =  i?  sin  (cr)  +  ursin(^  -  cr),  r  G  [0,rm],  (70) 


where  u  =  —1  and  (f>*j  is  the  free  parameter.  A  mere  reflection  on  the  y-axis  yields  the  optimal 
trajectories  if  u  =  +1  is  used  instead.  Several  time  optimal  trajectories  leading  to  capture  are 
depicted  in  Figure 


16 


a),  for  various  values  of  4>*j. 


A  further  interesting  fact  of  this  connection  between  the  SPG  and  the  ZNP  is  that  the  parametric 
family  of  curves  allows  for  a  simple  calculation  of  the  Time  to  Capture  (TtC)  over  the  entire 
domain  in  which  the  evader  will  be  intercepted,  given  her  choice  of  fixed  control  input.  To  obtain 
a  characterization  of  TtC  as  a  function  of  the  evader’s  fixed  control  input  u  and  space,  one  simply 
has  to  store  the  t  values  across  the  optimal  trajectories,  and  then  interpolate  these  values  over 
a  selected  grid  of  choice.  The  result  of  this  procedure  is  depicted  in  Figure  [l6](b) .  Note  that  in 
the  two-player  game,  when  the  evader  faces  only  one  pursuer,  the  TtC  function  corresponds  to 
the  Value  of  the  game,  and  can  be  obtained  in  the  same  manner  by  storing  the  capture  time  of 
trajectories  within  the  two-player  capture  region,  delineated  by  the  barrier  in  Figure  pj~5|( a ) . 


An  important  remark  concerning  collision  avoidance  when  multiple  agents  are  in  the  vicinity, 
and  a  worst-case  scenario  of  agent  hostility  is  assumed,  is  the  following.  By  comparing  the  TtC 
functions  that  correspond  to  u  =  —1  and  u  =  +1,  the  evader  is  able  to  prioritize  between  the  threats 
posed  by  the  surrounding  adversaries.  The  function  level  sets,  depicted  in  Figure  [T7|(a) ,  indicate 
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Isochrones 


-3.5  -3  -Z5  -2  ^Tfi  ^1  -(15  o" 


x 


(a) 


(b) 


Figure  17:  (a)  Isochrones  -the  value  function  level  sets-  when  the  evader’s  control  is  u  =  —1;  plotted  with 
colored  lines  and  time  optimal  trajectories  with  dashed,  (b)  Although  P2  is  closer  to  the  evader  than  PI 
(indicated  by  a  circle  centered  on  E),  PI  poses  a  more  imminent  threat  as  can  be  seen  by  the  isochrones. 
Thus  the  evader  can  increase  his  time  to  capture  by  reacting  to  PI  with  u  =  +1  instead  of  reacting  to  P2 
with  u  =  —1. 


that  the  Euclidean  distance  of  the  pursuer  relative  to  the  evader  is  not  the  most  suitable  metric  for 
the  purposes  of  identifying  which  pursuer  poses  the  most  imminent  threat.  An  example  of  assessing 
threat  is  shown  in  Figure  [T7)(b).  Although  P2  is  positioned  closer  to  the  evader  compared  to  PI, 
as  indicated  by  the  circle  centered  at  the  evader,  the  evader  has  the  incentive  to  react  against  PI 
instead  of  P2,  in  order  to  further  increase  the  time  to  capture. 


Necessary  and  Sufficient  Conditions  for  Evasion 


Consider  the  multi-pursuer  game  described  so  far,  in  which  all  agents  have  the  same  constant  speed 
and  the  initial  positioning  is  such  that  the  evader  is  facing  all  pursuers.  We  formally  define  the 
sets  A  and  B  as  the  sets  of  all  points  enclosed  between  and  the  x-axis  and  £2  and  the  x-axis, 
respectively  (see  Figure  15  "b)).  To  this  end,  let  p(r)  =  ^/x2(r)  +  y2(r)  and  substitute  (65)  and 


(66)  to  obtain 


P(r)  =  —  cos(ct))  +  v2t 2  —  2Rtv  sin(cr),  r  €  [0,  rm], 

The  change  of  variables 

0  =  9i(r)  =  arctan(y(r)/ x(r )),  r  e  [0,rm],  6  G  [0, 7r] . 


(71) 


(72) 
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yields  (pog1  1)(0)  as  the  expression  of  in  polar  coordinates,  while  the  expression  of  £2  is  similarly 
obtained  by  setting 

9  =  52 (t)  =  arctan(-y(r)/x(r)),  r  E  [0,  rm],  9  E  [0, 7r],  (73) 


to  account  for  the  reflection  on  the  y-axis.  Then,  keeping  in  mind  that  the  Cartesian  coordinates 
(x,y)  are  transformed  to  polar  coordinates  (r,6)  through  x  =  rcos8,  y  =  rsinf?,  we  define 

A  =  {( x,y )  :  x  =  r  cos#,  y  =  rsin0,  9  E  [0, 7r],  0  <  r  <  (p  o  gf1)^)},  (74) 

while  B  is  similarly  defined 

B  =  {(x,y)  :  x  =  rcos9,  y  =  r  sin0,  9  E  [0, 7r],  0  <  r  <  (po  g ^_1)(0)}.  (75) 


We  now  proceed  to  obtain  necessary  and  sufficient  conditions  for  evasion.  To  this  end,  for  the  sets 
A  and  B  defined  by  (74)  and  (75)  respectively,  let 

I A  = 


1,  if  at  least  one  pursuer  is  in  A,  that  is,  (atj,  yf)  E  A  for  some  i,  i  =  1, . . . ,  N 
0,  otherwise. 


(76) 


Ib 


1,  if  at  least  one  pursuer  is  in  B,  that  is,  (xj,  yt)  e  B  for  some  i,  i  =  1, . . . ,  N 
0,  otherwise. 


(77) 


and  define  T,  =  I  a  +  Ib-  Notice  that  £  E  {0, 1,  2},  and  that  every  possible  positioning  of  pursuing 
agents  on  the  upper  half  of  the  Euclidean  plane  corresponds  to  a  value  in  £. 


Proposition  1.  //£  E  {0,1},  the  evader  escapes. 


Proof.  We  investigate  the  two  cases  separately: 

•  Case  £  =  0.  This  value  corresponds  to  an  arbitrary  number  of  pursuers  being  placed  outside 
of  A  and  B.  In  this  case,  the  evader  may  choose  either  u  =  1  or  u  =  —  1  since  in  both  cases, 
a  time  optimal  trajectory  leading  to  capture,  that  passes  through  the  position  of  at  least  one 
of  the  pursuers,  does  not  exist.  This  means  that  the  evader  has  enough  time  to  perform  her 
evading  maneuver.  The  trajectories  in  the  evader  fixed  reference  frame,  for  both  cases  of 
evader  control,  are  depicted  in  Figure  [l8| 

•  Case  £  =  1.  This  implies  that  an  arbitrary  number  of  pursers  are  in  either  A  \  {A  D  B)  or 
B  \  (An  B),  but  not  in  the  intersection  An  B.  Similarly  to  the  previous  case,  the  evader 
has  enough  time  to  complete  her  turning  maneuver,  i.e. ,  has  a  control  action  for  which  no 
time-optimal  trajectory  leading  to  capture  passing  through  a  pursuer  location  exists.  The 
difference  however  is  that  this  control  action  is  uniquely  determined  (u  =  1  for  pursuers  in 
A  \  (An  B)  or  u  =  —1  for  pursuers  in  B  x  (^4  D  B)).  An  example  of  this  case  is  depicted 
in  Figure  [l5|(b).  The  evader  chooses  u  =  —  1  to  avoid  PI  and,  by  doing  so,  initially  turns 
towards  P2.  However,  she  has  enough  time  to  complete  the  turn  without  interception  by  P2, 
and  thus  escapes. 
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□ 


(a)  (b) 

Figure  18:  The  case  of  E  =  0.  All  pursuers  are  located  outside  of  A  and  B1  no  ZNP  solution  passes  through 
their  location,  and  the  evader  can  use  either  one  of  his  extremal  controls  to  escape:  (a)  u=-l  and  (b)  u=+l. 
In  both  cases,  the  barrier  surface  with  a  solid  line  corresponds  to  the  selected  control,  while  the  one  with 
the  dashed  line  corresponds  to  the  alternative  choice.  Both  cases  lead  to  escape.  Since  a  policy  for  capture 
(be  it  optimal  or  not)  does  not  exist,  a  pure  pursuit  policy  is  assigned  to  both  pursuers,  for  the  purposes  of 
simulation. 

We  now  proceed  the  next  statement 

Proposition  2.  If  there  is  at  least  one  pursuer  in  An  B,  collision  occurs. 


Proof.  This  follows  immediately  from  the  SPG  solution.  Indeed,  picking  an  arbitrary  pursuer  in 
An  B  and  disregarding  all  the  rest  of  the  pursuers  of  the  game,  capture  is  guaranteed  because  the 
pursuer  is  located  in  the  capture  zone  delineated  by  the  SPG,  and  thus  the  evader  has  no  strategy 
that  leads  to  evasion  against  that  particular  pursuer.  Notice  that  this  initial  condition  corresponds 
to  £  =  2.  □ 


Before  obtaining  necessary  and  sufficient  conditions  for  evasion,  the  following  Lemma  will  be 
useful: 

Lemma  5.  If  there  is  at  least  one  purser  both  in  A  \  (An  B)  and  in  B  \  (An  B),  but  none  in 
An  B,  capture  occurs  simultaneously  by  at  least  one  pursuer  from  A  \  (A  n  B)  and  one  pursuer 
from  B  \  (An  B). 
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Proof.  It  is  obvious  that  a  constant  evader  control  u  =  —  1  or  a  =  +1  will  lead  to  capture  by  one  of 
the  pursuers,  since  for  both  control  inputs  u  =  —  1  and  u  =  +1,  a  time  optimal  trajectory  leading 
to  capture  generated  by  the  ZNP  solution  and  passing  through  pursuer  positions,  exists.  Any 
intermediate  constant  value  u  G  [—1,1]  will  also  yield  capture,  since  the  surface  on  the  Euclidean 
plane  covered  by  family  of  time  optimal  trajectories  generated  by  the  ZNP  formula,  is  minimal 
for  the  extreme  evader  control  values;  any  intermediate  constant  value  of  u  will  yield  sets  A'  and 
B'  which  are  supersets  of  A  and  B  respectively.  The  singular  case  u  =  0  is  no  exception,  since 
Zermelo’s  law  for  the  pursuers  then  reduces  to  parallel  navigation,  which  again  leads  to  interception. 

Simultaneous  capture  occurs  because  of  the  symmetry  of  the  problem  at  hand.  To  this  end, 
assume,  ad  absurdum  that  the  evader  gets  intercepted  by  only  one  pursuer,  namely  the  one  on  his 
right  (without  loss  of  generality).  Then,  just  before  interception  occurs,  the  evader  could  prolong 
his  time  to  capture  by  turning  to  the  left,  towards  the  other  pursuer.  Thus,  an  optimal  evader 
maneuver  leads  to  simultaneous  capture  by  at  least  one  pursuer  of  each  domain.  □ 

Noticing  that  the  initial  condition  described  in  Lemma  1  corresponds  to  E  =  2  and  combining 
Proposition  1,  Proposition  2  and  Lemma  1,  the  following  corollary  is  obtained  immediately. 

Corollary  2.  Collision  occurs  if  and  only  if  S  =  2. 

Corollary  1  summarizes  the  necessary  and  sufficient  conditions  for  evasion,  under  Assumption  1, 
in  terms  of  purser  initial  positioning  on  the  plane.  It  demonstrates  that,  once  the  initial  positions 
of  all  pursuers  have  been  specified,  the  game  outcome  is  determined.  Furthermore,  Corollary  1 
defines  which  areas  of  the  Euclidean  plane  are  considered  “danger  zones” ,  in  the  sense  that  placing 
at  least  one  pursuer  at  that  location  leads  to  a  different  game  outcome. 


3.2  Optimal  Evader  Strategies  in  a  Two-Pursuer  One-Evader  Problem 

The  problem  under  investigation  in  this  section  was  motivated  by  the  following  situation:  consider 
a  decoy,  whose  speed  is  limited,  entering  a  defense  area  guarded  by  multiple  agents  following  a 
prescribed  pursuing  protocol.  The  decoy’s  objective  is  to  avoid  capture  as  long  as  possible  so  that 
it  can  ’’buy  more  time”  for  the  other  evader (s).  The  goal  is  to  find  the  optimal  evading  strategy  to 
maximize  the  time-of-capture. 

We  assumed  a  simplified  version  of  this  problem,  involving  only  two-pursuers  and  one  evader. 
Both  pursuers  implement  a  relay-pursuit  strategy,  according  to  which  only  one  pursuer  is  assigned 
to  go  after  the  target  at  every  instant  of  time.  The  latter  is  the  active  pursuer,  while  the  other 
pursuer  is  designated  as  the  inactive  pursuer.  This  strategy  may  be  desirable  in  cases  when  the 
pursuing  agents  play  a  dual  role,  namely,  both  as  pursuers  and  as  guardians  protecting  some  area 
of  interest,  or  when  the  power  or  energy  consumption  of  the  pursuing  agents  needs  to  be  taken  into 
consideration. 
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In  our  problem  formulation,  we  also  assumes  that  the  pursuers  have  a  stroboscopic  view  of  the 
evader’s  position,  i.e. ,  at  every  instant  of  time,  the  pursuers  only  know  the  current  position  of  the 
evader  but  not  its  velocity.  It  is  further  assumed  that  the  evader  is  slower  than  the  two  pursuers, 
but  is  aware  of  their  strategies  and  current  positions. 


Problem  Formulation 

Due  to  the  symmetry  of  the  problem,  when  the  evader  resides  on  the  Voronoi  cell  boundary, 
we  can  assign  any  one  of  the  two  pursuers  to  be  the  active  pursuer.  Therefore,  throughout  the 
pursuit  process,  we  can  fix  one  pursuer  to  be  the  active  pursuer,  while  the  inactive  pursuer  remains 
stationary  on  the  plane,  whose  mere  presence,  however,  imposes  a  state  restriction,  namely  that 
the  evader  does  not  enter  the  interior  of  its  corresponding  Voronoi  cell. 

The  equivalent  one-pursuer  one-evader  optimal  control  problem  with  state  constraint  can  be 
expressed  in  a  reduced  three-dimensional  space  by  fixing  the  origin  of  a  new  coordinate  system  to 
the  active  pursuer  and  by  aligning  the  positive  direction  of  the  x-axis  with  the  direction  of  the  active 
pursuer’s  velocity  vector.  Let  us  denote  this  frame  by  AL  In  the  frame  A4,  the  active  pursuer  is 
fixed  at  the  origin,  and  the  motion  of  the  evader  is  restricted  to  lie  on  the  x-axis  due  to  the  pure 
pursuit  strategy  of  the  active  pursuer.  The  inactive  pursuer,  however,  is  no  longer  stationary  in 

M. 


Let  9  =  (j)E  —  (j)P  be  the  angle  between  the  active  pursuer’s  and  evader’s  velocity  vectors.  Then 
4>p  is  given  by 


usin# 


xK  -  xf 


(78) 


Let  the  coordinates  of  the  inactive  pursuer  and  the  evader  be,  respectively,  xG  =  (77,  Q  and 


xE  =  (y,  0)  in  the  frame  A4,  as  shown  in  Figure  19  The  equations  of  motion  in  the  frame  A4  are 
then  given  by 


while  the  constraint  is  given  by 


X  =  —u  +  v  cos  9, 


f/  =  —u  +  v—  sin  9, 
X 

Q  =  —  v—  sintx, 

X 


5(0  =  \{x2  -  (v  -  xf  -  C2)  <  0, 


(79) 

(80) 
(81) 


(82) 
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Figure  19:  Geometry  of  evader  (E),  active  (P)  pursuer,  and  inactive  (G)  pursuer  in  the  inertial  (Z) 
and  the  active  pursuer  frames  (M). 


where  £  =  [x,  i),  C]T  £  K3-  The  boundary  conditions  in  frame  M.  are  given  by 


X(0)  =  ||xB(0)  -xP(0)||,  x(t/)  =  e, 

7](0)  =  ||xP(0)||  cos(7r  +  t/>(0)  -  (j)P( 0)),  r](tf)  free,  (83) 

C(0)  =  ||xP(0)||  sin(7r  +  V’(O)  -  (j)P( 0)),  C (tf)  free, 


where  i/j( 0)  is  the  initial  value  of  the  angle  =  atan (yP,xP),  as  shown  in  Figure  19  Given  (79) 


(81),  the  constraint  (82)  and  boundary  conditions  (83),  our  goal  is  to  find  the  optimal  control  6  to 


maximize  tr 


Problem  Analysis  and  Main  Results 


Consider  the  optimal  control  problem  ( 79 )-( 83 ) .  Assume  that  u  >  v  >  u/2  and  assume  that 
the  initial  conditions  are  such  that  the  constraint  is  active.  Utilizing  the  Pontryagin’s  Minimum 
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Principle,  our  analysis  has  shown  that  the  optimal  control  of  the  evader  is  given  as  follows: 

A2C  ~ 


tan  0*(t)  =  < 


AiX  ’ _ 

q  —  op\Jp2  +  q2  —  1 


t  <E  [0,ti], 
t  €  [ri,r2], 


p  +  oqsjp2  +  q2  —  1 

0,  iefajic], 


where  p  =  vq/(ux),q  =  v(/(ux),v  =  sgn(q).  Furthermore,  r2  satisfies  the  switching  condition: 

-  ux(t2 )  =  0.  (84) 

Summarizing  the  previous  analysis,  we  conclude  that  the  optimal  trajectory  of  the  evader  in¬ 
volves  three  periods:  first  she  moves  in  the  Voronoi  cell  of  the  active  pursuer  in  a  way  such  that  the 
optimal  conditions  (transversality  condition,  Erdmann  corner  condition,  etc.)  are  satisfied  before 


she  hits  the  boundary,  then  she  moves  along  the  boundary  until  the  switching  condition  (84)  is 
satisfied,  finally  the  evader  moves  along  the  LoS  till  capture  occurs. 

Given  the  analysis  of  the  previous  section,  one  can  compute  numerically  the  evader’s  optimal 
trajectory.  Note,  however,  that  the  second  and  third  period  strategies  can  not  be  easily  implemented 
without  resorting  to  the  solution  of  a  two-point  boundary  value  problem.  Therefore,  we  hereby 
propose  a  suboptimal  control  scheme  for  the  evader  that  can  be  computed  analytically.  It  is 
presented  as  follows:  The  evader  first  moves  along  the  LoS  away  from  the  pursuer  before  it  hits 
the  boundary.  Then  she  moves  on  the  boundary  until  the  switching  condition  (84)  is  satisfied. 
Afterwards,  the  evader  resumes  moving  along  the  LoS  until  capture  occurs.  Our  simulation  results 
show  that  the  relative  differences  between  the  time-to-capture  by  applying  this  strategy  and  the 
optimal  one  are  always  less  than  5%,  given  different  speed  ratio  and  initial  positions  of  the  evader 
and  the  pursuer. 

The  main  idea  of  the  proposed  suboptimal  evader  control  strategy  is  that  the  evader  moves  along 
the  LoS  whenever  it  is  able  to,  otherwise  it  will  move  along  the  Voronoi  cell  boundary.  We  can 
further  generalize  this  idea  and  propose  a  suboptimal  evading  strategy  for  the  nrultiple-pursuer/one- 
evader  relay  pursuit  problem.  In  this  case,  there  will  be  one  active  pursuer  and  multiple  inactive 
pursuers  that  play  the  role  of  guards.  The  objective  of  the  evader  is  to  maximize  the  instantaneous 
velocity  component  along  the  LoS  at  every  instant  of  time  without  violating  the  state  constraints 
imposed  by  the  fact  that  the  evader  never  enters  the  interior  of  the  Voronoi  cells  of  an  inactive 
pursuer. 

We  have  implemented  the  generalization  of  the  suboptimal  evading  strategy  to  a  three-pursuer /one- 
evader  relay  pursuit  problem  and  compared  the  result  with  the  numerical  optimal  control  of  the 
evader.  The  numerical  result  is  shown  in  Figure  20  The  corresponding  optimal  time-to-capture 
is  tc  =  3.1530.  The  evader’s  trajectory  generated  by  applying  the  evading  strategy  we  proposed, 
and  the  active  pursuer’s  trajectory  are  presented  in  Figure  [20]  in  red  and  green  lines,  respectively. 
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As  seen  from  Figures  |20|a)  and  |20|b)  the  trajectories  are  very  similar.  The  time-to-capture  is 
t'c  =  3.1366.  The  relative  time  difference  this  time  is  A  =  0.52%. 


"A 


A, 


-2  -1.5  -1  -0.5  0  0.5  1  1.5 

X 


(a) 


iL 


A, 


A. 


Am 


-2  -1.5  -1  -0.5  0  0.5  1  1.5 

X 


(b) 


Figure  20:  (a)  Trajectory  of  the  active  pursuer  (in  red)  and  optimal  trajectory  of  the  evader  (in 
green)  obtained  from  GPOPS.  The  cyan  line  represents  the  Voronoi  diagram  of  the  three  pursuers 
with  initial  conditions,  (b)  Trajectories  generated  by  applying  the  proposed  suboptimal  evading 
strategy. 


3.3  Pursuit  Evasion  Game  of  Two  Players  under  an  External  Flow  Field 

One  common  assumption  among  all  previous  results  on  pursuit  and  evasion  has  been  that  no 
environmental  conditions  may  affect  the  outcome  of  the  game.  However,  when  either  the  pursuer 
or  the  evader  (or  both)  is  a  small  autonomous  underwater  vehicle  (AUV)  or  an  unmanned  aerial 
vehicle  (UAV),  the  presence  of  the  sea  current  or  the  wind,  will  significantly  affect  their  motion. 
As  a  result,  their  behavior  and  the  solution  of  the  differential  game  may  be  greatly  affected  by  the 
existence  of  the  external  flow  field.  To  fill  this  gap  in  the  literature,  we  considered  the  differential 
game  of  pursuit  and  evasion  between  two  players  on  a  plane  under  an  external  flow  field.  It  is 
assumed  that  the  pursuer  and  the  evader  move  with  constant  but  different  speeds,  and  they  are 
both  agile,  that  is,  they  are  allowed  to  change  their  headings  instantaneously.  To  simplify  the 
analysis,  it  was  assumed  that  the  flow  field  is  approximated  by  a  time- invariant,  spatially-affine 
function.  Our  main  goal  was  to  find  the  region  of  initial  conditions  of  both  players  that  leads  to 
capture  when  both  players  act  optimally,  and  derive  the  corresponding  optimal  strategies  of  the 
two  players  when  capture  is  guaranteed. 
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Problem  Formulation 


Consider  a  pursuer  and  an  evader  moving  on  a  plane  under  the  influence  of  an  external  flow  field. 
The  equations  of  motion  for  the  pursuer  and  the  evader  in  the  inertial  reference  frame  are  given  by 


xP  =  vP  cos  (p  +  wi(xP,  yP), 

(85) 

ijP  =  vP  sin  (p  +  w2(xP,yP), 

(86) 

xE  =  cos  ip  +  wi(xE,yE), 

(87) 

yE  =  vE  sin  ip  +  w2(xE,  yE), 

(88) 

where  (xP,yP)  G  M2  and  (xE,yE)  G  M2  denote  the  positions  of  the  pursuer  and  the  evader,  re¬ 
spectively,  cp,  ip  G  [— 7T,  7r]  are  the  control  of  the  pursuer  and  the  evader,  and  vP  and  vE  represent 
the  speed  of  the  pursuer  and  the  evader,  respectively.  In  this  work,  we  will  assume  that  vP  >  vE. 
Finally,  uq(-,-)  and  are  the  components  of  an  external  spatially  varying  flow  field  along 

x-axis  and  y-axis,  respectively. 

In  order  to  simplify  the  analysis,  it  will  be  assumed  that  the  external  flow  field  is  approximated, 
at  least  locally,  by  a  time-invariant  affine  function.  Specifically,  let 

wi  (x,  y)  =  aqx  +  piy  +  71 ,  (89) 

w2(x,y)  =  a2x  +  /32y  +  72,  (90) 

where  a*,  /%,  7 \  G  M  (i  =  1,2)  are  prescribed  constants.  By  choosing  a  new  reference  frame  whose 
origin  is  at  the  pursuer,  the  kinematic  equations  can  be  represented  in  a  reduced  two-dimensional 
space  space.  In  particular,  let  x  =  xE  —  xP  and  y  =  yE  —  yP  be  the  relative  distance  between  the 
evader  and  the  pursuer  along  the  x-axis  and  y-axis,  respectively.  The  kinematic  equations  in  terms 
of  x  and  y  are  then  given  by 


x  =  vE  cos  ip  —  vP  cos  cp  +  a\x  +  j3\ y,  (91) 

y  =  vEsinip  —  vPsm<p  +  a2x  + 02y.  (92) 

By  defining  the  reduced  state  as  x  =  [x,  y]T,  the  equations  can  then  be  written  compactly  as 

x  =  rEv  -  UpU  +  w(x),  (93) 


is  given  by  w(x)  =  Ax,  where  A  = 


.  The  game  terminates  when  capture  occurs,  that  is, 


where  v  =  [cos  ip,  sin  tp]  and  u  =  [cos  (p,  sinc/>]T  are  the  controls,  and  where  the  relative  wind  field 

/?i 

«2  P  2 

when  the  evader  is  in  the  interior  of  a  ball  B  of  radius  t  centered  at  the  pursuer’s  current  location, 
given  by  B  =  {x  G  M2  :  |x|  <  £}.  The  terminal  surface  is  the  manifold  in  the  state  space  which, 
once  penetrated,  determines  termination  of  the  game.  The  terminal  surface  C  is  thus  the  circle 
centered  at  the  origin  of  radius  £,  i.e.,  C  =  {x  G  M2  :  |x|  =  £}.  Accordingly,  the  state  space  £  is  the 
portion  of  the  x,  y-plane  exterior  to  C,  that  is,  £  =  {x  G  R2  :  |x|  >  £}. 
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We  want  to  find  the  region  in  the  state  space  such  that  the  evader  can  be  captured  by  the 
pursuer  if  their  initial  relative  coordinates  fall  inside  this  region.  This  region  is  denoted  as  the 
capture  zone.  The  region  which  leads  to  escape  of  the  evader  is  the  escape  zone. 


Problem  Analysis 

To  analyze  the  problem  we  followed  the  standard  approach  of  the  game  of  kind  of  Isaacs.  First,  we 
focused  on  identifying  the  usable  part  of  the  terminal  surface.  The  UP  is  the  subset  of  C  in  which 
the  pursuer  can  cause  termination  immediately  when  both  players  act  optimally.  The  remaining 
points  on  C  form  the  nonuseable  part,  that  is,  termination  will  not  occur  even  if  the  trajectory 
reaches  this  part  of  C  under  optimal  play  (i.e.,  when  both  players  act  optimally).  The  part  of  C 
that  separates  the  usable  part  and  the  nonuseable  part  of  C  is  called  the  boundary  of  the  usable 
part  (BUP). 

In  order  to  find  the  usable  part,  we  parameterize  C  with  the  variable  s  according  to 

x  =  icoss,  y  =  £sins.  (94) 

Let  r2  =  x2  +  y 2  =  |x| 2 .  Taking  the  time  derivative  on  both  sides  of  the  last  equation,  and  consider 
only  points  on  C,  we  have 

If  =  t  cos  s{vE  cos  if  —  vP  cos  4>  +  a\t  cos  -s  +  fi\£  sin  s') 

+  £sins(uB  sin  if  —  vP  sin^  +  a2£cos  s  +  /32£  sin  s). 

The  usable  part  of  C  is  specified  by  the  condition 

minmaxf(x)  <0,  x  G  C,  (95) 

<t>  ^ 

which  implies  that  the  relative  trajectory  is  able  to  penetrate  the  terminal  surface  C.  From  the  last 
two  inequalities,  and  using  standard  trigonometric  identities,  we  have  that,  for  x  £  C, 

H 

min  max  r(x)  =  vE  —  vP  +  -(aq  +  /?2) 

4>  if  2 

+  ^t(ai  “  #2)  cos  2s  +  (/3i  +  a2)  sin  2s].  (96) 

Let  now  a  =  y/ (aq  —  /32)2  +  (Pi  +  «2)2-  When  a  =  0,  we  have  aq  —  /32  =  0  and  (3\  +  a2  =  0. 

Hence,  whether  the  game  terminates  depends  on  the  sign  of  vE  —  vP  +  l{a\  +  /32 ) / 2 .  Specifically, 
when  vE  —  vP  +  £(a±  +  /32)/2  <  0,  the  usable  part  of  the  terminal  surface  is  C  itself,  whereas  when 

vE  —  vP  +  £(oq  +  /32) / 2  >  0,  the  game  will  not  terminate  under  any  initial  conditions  of  the  pursuer 

and  the  evader,  which  means  that  the  evader  always  escapes.  In  the  latter  case  the  whole  state 
space  £  is  the  escape  zone. 
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Henceforth,  we  assume  that  a/0.  Then  (95)  and  (|96j)  imply  that 

la 


min  max  r  =  vE  —  vP  +  -  (on  +  fa)  +  —  sin(#  +  2s)  <  0, 

<t>  1 j)  2  2 


(97) 

where  9  is  given  by  sin#  =  {a\  —  fa)/ a  and  cos 9  =  (/3i  +  0,2) /a.  From  ([97])  we  reach  the  conclusion 
that  in  the  reduced  space  the  game  will  not  terminate  if 

2 (yP  -  vE)  -  l{a\  +  fa) 


la 


<  -1, 


(98) 


where  a  =  \J (a±  —  fo)2  +  (/?i  +  ct^)2-  Finally,  when  (  >  1  the  usable  part  is  the  whole  terminal 
surface  C. 


Henceforth,  we  assume  that  —1  <  £  <  1.  Under  this  assumption,  the  BUP  is  determined  from 
sin(0  +  2s)  =  C-  This  yields  four  solutions  in  [0,  2n),  denoted  by  si,  S2,  S3  =  si  +  7 r,  S4  =  S2  +  ^ r. 
Hence,  the  BUP  contains  four  points  on  C,  represented  by  Pj  =  (cos  Si,  sin  Si),  i  =  A 

typical  illustration  of  the  terminal  surface,  which  is  divided  into  the  usable  and  nonusable  parts  by 


the  BUP,  which  consists  of  four  points  on  the  terminal  surface,  is  shown  in  Figure  21 


Figure  21:  The  terminal  surface  C  of  the  game  is  given  by  a  circle  of  radius  l.  The  circle  is  separated 
by  the  BUP  (4  points  on  the  circle  parameterized  by  si  through  54)  into  the  usable  part  (black 
lines)  and  the  nonusable  part  (red  lines).  Every  barrier  meets  the  terminal  surface  at  the  BUP 
tangentially. 


Now  we  turn  to  the  construction  of  the  barrier.  The  barrier  is  a  surface  in  the  state  space  that 
consists  of  initial  conditions  for  which  the  outcome  is  neutral.  One  property  of  the  barrier  is  that  it 
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is  never  crossed  by  either  the  pursuer  or  the  evader  during  optimal  play.  In  particular,  the  barrier 
emanates  from  the  BUP  and  is  tangent  to  C  at  the  BUP.  We  denote  the  barrier  by  S.  At  each 
point  on  S  we  define  the  normal  vector  u  =  [v\,  v?\ T  £  M2  extending  into  the  escape  zone. 


Denoting  with  ( ° )  the  derivative  with  respect  to  r,  the  retrograde  evolution  of  the  states  and 
the  vector  u  can  be  established  as: 


x  =  (vP  -  vE)—  -  aix  -  fay, 
P 

y=(vP-  vE)  —  -  a2x  -  fay, 
P 

v\  =  a\V\  +  a2v2, 

"2  =  +  /32U  2. 


(99) 


By  the  dehnition  of  the  BUP  and  the  barrier,  it  is  clear  that  the  barrier  starts  at  the  BUP  towards 
the  state  space  £  in  a  retrogressive  sense.  Moreover,  the  two  surfaces  meet  tangentially,  since  no 
penetration  occurs  at  the  BUP  and  the  vectorhelds  of  both  players  are  tangential  to  the  barrier. 
Pick  any  s  £  {si,  s2,  S3,  S4}.  The  initial  conditions  for  the  RPEs  are  thus  given  by  x(r  =  0)  = 
I  cos  s,  y(r  =  0)  =  l  sin  s,  (r  =  0)  =  coss,  v2  (r  =  0)  =  sins.  By  integrating  (99)  subject  to  these 
initial  conditions  we  obtain 


v(t)  =  caTtv(t  =  0), 

x(r)  =  e~Arx(T  =  0)  +  [  e-A(T~®b{Z)  d^, 

Jo 

where  b(r)  =  ( vP  —  ve)v(t) /\v{t)\.  Therefore, 
x(t)  =  ie 


(100) 

(101) 


te~Ar 

cos  s 

rr  e(A+AT)£ 
+  (vP  vE)e  At  / 

cos  s 

sin  s 

Jo  KOI 

sin  s 

d£- 


(102) 


By  plotting  the  trajectories  of  ( 102 )  given  the  four  initial  conditions  of  the  BUP,  we  can  determine 
whether  these  are  valid  barriers  and  whether  the  state  space  £  is  separated  by  the  barriers. 


Game  of  Degree  in  the  Capture  Region 

Now  that  we  have  identified  the  capture  region,  we  aim  at  determining  the  optimal  trajectory  of  x 
inside  this  region  by  solving  a  game  of  degree.  Within  the  capture  region,  the  performance  index 
is  J  =  /0*cdt.  To  this  end,  we  define  the  value  function  V(x),  which  satisfies  the  Hamilton- Jacobi- 
Isaacs  (HJI)  equation 


0  =  minmaxH(x,  Vx), 

<j>  ip 


(103) 
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where  the  Hamiltonian  H  is  given  by 


<9V  ( 

H  =  1  +  I  vE  cos  if)  —  vP  cos  cp  +  a\x  +  fay 


dX  ( 
+  faj  V 


vE  sin  ip  —  vP  sin  (p  +  o.2x  +  fay 


(104) 


Let  Vr  = 


x  =  Xy  =  Then  (103)  can  be  rewritten  as 

0  =1  +  Xx(a1x  +  fay)  +  Xy(a2x  +  fay) 

+  min{— vP(yx  cos  (p  +  Xy  sin  cp)} 

<t> 

+  max{n£;(Vx  cosip  +  Xy  sin^)}- 

Hence,  the  optimal  controls  (fa  and  ip*  are  given  by 

1  *  Xx  -i*  V v 

cos  (p  =  — ,  sm0  =  — 

y  y 

I  *  ■  I  *  Vy 

cos  ip  =  — ,  sin^  =  — 

y  y 


dv 


(105) 


(106) 


where  y  =  +  Vy.  Plugging  ( 106 )  back  into  the  Hamiltonian,  we  get  the  optimal  Hamiltonian 


H*  =  1  +  [yE  —  vP)y  +  X  x(ot\x  +  fay)  +  Xy(a2x  +  fay). 
The  RPEs  can  then  be  expressed  as 

x  =  (vP-  vE)  —  -  aix  -  fay , 
y  =  (vP-  vE)  —  -  a2x  -  fay , 

/i 

Xx  =  (XlX  X  +  OL2  Vy, 

Vy  =  faXX  +  faXy. 


(107) 

(108) 

(109) 

(110) 
(111) 


On  the  terminal  surface  C,  we  have  V  =  0.  Along  with  the  parameterization  of  C  by  x  =  7coss, 
y  =  £sins,  we  get 

dX 

0  =  — —  =  1{—Xx  sins  +  V„  cos  s). 
os  y 

Upon  solving  these  equations,  we  further  get,  for  some  <5  >  0, 

Xx(t  =  0)  =  <5  cos  s,  Vy(r  =  0)  =  <5  sin  s.  (112) 
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By  substituting  (|1 12 )  into  the  expression  for  H*,  one  can  solve  for  5  to  obtain  5  =  1  / 77  where 
r/  =  vP  —  vE  —  £{a\  cos2  s  +  ((3i  +  0:2)  sin  s  cos  s  +  (3 2  sin2  s).  Integrating  the  RPE’s  ( 108 )  through 
(111)  subject  to  the  initial  conditions  (112)  yields 


"V  x  (r )" 

Att 

5  cos  s 

V,(r)_ 

=  e 

5  sin  s 

and  hence 


x(t) 

—  At 

£  cos  s 

y(r) 

=  e 

£  sin  s 

+  {vp 


vE)e 


^  nr  e(A+A T)£ 

5  cos  s 

Jo 

5  sin  s 

d£, 


which  yields  the  optimal  trajectory  for  the  game  of  degree. 


(113) 


(114) 


Simulation  Results 


In  this  section,  we  present  numerical  simulations  to  illustrate  the  previous  analysis.  In  the  following 
cases,  we  vary  the  matrix  A  while  we  keep  £,  vP  and  vE  fixed  to  compute  different  types  of  barriers 
under  different  flow  fields.  Henceforth,  we  let  £  =  0.1,  vP  =  1.0,  vE  =  0.9.  For  the  parameters  of 
the  flow  field  in  the  inertial  frame,  we  set  71  =  72  =  0. 


Case  1 :  A  = 
a  =  5,  £  =  0.4  anc 


.  This  matrix  has  two  pure  imaginary  eigenvalues  (center).  In  this  case, 


0  10 
-5  0 

the  corresponding  values  for  the  BUP  are  si  =  0.2058,  S2  =  1.3650,  S3  =  3.3474 
and  S4  =  4.5066.  As  shown  in  Figure  [22{a),  the  trajectories  of  the  RPEs  emanating  from  Pi  and 
P3  are  inside  £>;  these  two  trajectories  are  outside  the  state  space  £.  Hence,  they  are  not  valid 
barriers  and  are  discarded.  On  the  other  hand,  the  trajectories  emanating  from  P2  and  P4  are  valid 
barriers.  They  have  spiral-like  shapes  but  they  fail  to  separate  the  state  space  into  two  parts.  The 
whole  state  space  is  a  capture  zone;  regardless  of  the  initial  conditions  of  the  two  players,  capture 
is  guaranteed.  The  dashed  magenta  lines  in  Figure  [22|(a)  show  the  optimal  trajectories  in  relative 
coordinates  with  respect  to  different  initial  positions  on  the  usable  part  of  the  terminal  surface. 
Although  the  barrier  does  not  separate  the  state  space  into  capture  and  escape  zones,  it  is  still  not 
crossed  during  optimal  play,  which  gives  us  some  information  as  to  how  the  optimal  trajectories 
look  like.  The  barrier  also  marks  a  discontinuity  in  the  value  function. 

Given  the  initial  positions  for  the  evader  and  the  pursuer  as  xp(0)  =  [—0.764,  0.337]  and  xp(0)  = 
[—0.524,0.336],  respectively,  the  optimal  trajectories  of  the  evader  and  the  pursuer  in  the  inertial 


frame  are  depicted  in  Figure  22  b).  These  trajectories  are  consistent  with  the  external  flow  field 


represented  by  the  black  arrows.  The  results  suggest  that  both  players  are  trying  to  take  advantage 
of  the  flow  field.  Intuitively,  this  makes  sense.  Since  the  matrix  A  has  purely  imaginary  eigenvalues, 
the  uncontrolled  system  trajectories  are  circles  around  the  origin.  The  flow  field  does  not  give  an 
advantage  to  either  the  pursuer  or  the  evader.  It  is  then  reasonable  that  under  optimal  controls 
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Figure  22:  (a)  Barriers  in  frame  A4  when  A  =  [0,10;— 5,0].  The  dashed  magenta  lines  are  the 
optimal  trajectories  of  the  relative  coordinates  emanating  from  the  usable  part  of  the  terminal 
surface,  (b)  Optimal  trajectories  of  the  evader  in  green  and  the  pursuer  in  dashed  magenta, 
respectively,  where  A  =  [0, 10;  —5,  0].  Red  circle  around  the  final  position  of  the  pursuer  represents 
the  terminal  surface.  The  flow  field  is  depicted  by  the  black  arrows  in  the  background. 


of  both  players,  the  trajectories  in  the  reduced  state  move  in  spiral-like  patterns,  as  confirmed  in 
Figure  [2^](  a). 


Case  2:  A 


1.4020 

1.4770 


-1.0772 

0.7756 


.  In  this  case,  the  eigenvalues  are  a  complex  conjugate  pair 


with  positive  real  part  (unstable  spiral).  These  values  correspond  to  a  =  0.7431,  (  =  —0.2390  and 
the  corresponding  parameters  for  the  BUP  are  si  =  5.6612,  S2  =  1.1901,53  =  2.5196,  S4  =  4.3317. 
As  depicted  in  Figure  [23](  a),  and  similarly  to  the  first  case,  the  trajectories  of  the  RPEs  emanating 
from  P2  and  P4  are  inside  B  and  are  thus  discarded.  The  trajectories  starting  from  P\  and  P3 
intersect  C  after  some  time,  and  thus  the  trajectories  after  the  intersection  are  discarded.  In  this 
case,  the  barrier  separates  the  capture  zone  from  the  escape  zone.  The  capture  zone  is  represented 
by  the  shaded  region  in  Figure |23|a).  All  the  remaining  space  outside  the  circle  is  the  escape  zone. 
The  optimal  trajectories  of  the  evader  and  the  pursuer  in  the  inertial  frame  are  depicted  in  Figure 


that  in  this  case,  there  is  a  small  region  of  relative  initial  positions  for  the  pursuer  and  the  evader 
such  that  capture  occurs. 


23  b)  with  initial  positions  x#(0)  =  [—0.20,0.683] 


cllIU.  A P\yJ J 


In  this  case  the  matrix  A  has  two  complex  eigenvalues  with  positive  real  parts,  which  implies 
that  the  origin  is  an  unstable  spiral.  The  trajectories  of  the  uncontrolled  system  would  result  in 
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(a)  (b) 

Figure  23:  (a)  Barriers  in  frame  M.  for  A  =  [1.4020,  —1.0772;  1.4770,  0.7756],  the  shaded  region  is 
the  escape  zone  and  the  white  region  outside  the  circle  is  the  escape  zone,  (b)  Optimal  trajectories 
of  the  evader  in  green  and  the  pursuer  in  dashed  magenta,  for  A  =  [1.4020,  —1.0772;  1.4770,  0.7756]. 


|x|  ->  oo  as  time  goes  on.  In  this  case,  the  flow  field  gives  an  advantage  to  the  evader.  Indeed,  as 
shown  in  Figure  [23](a) ,  the  capture  zone  is  very  small  compared  to  the  escape  zone. 


Case  3:  A 


1  2 
2  1 


In  this  case,  a  =  4,  (  =  0  and  the  corresponding  parameters  for  the  BUP 


are  si  =  0,S2  =  7t/2,S3  =  -k  and  S4  =  37r/2.  As  illustrated  in  Figure  [24|'a),  all  four  trajectories 
emanating  from  Pi,  P2,  P3  and  P4  are  valid  barriers.  They  separate  the  state  space  into  two 
capture  zones  and  two  escape  zones,  depicted  in  the  figure  by  the  two  shaded  regions  and  the 
two  white  regions,  respectively.  Typical  optimal  trajectories  of  the  evader  and  the  pursuer  in  the 
inertial  frame  with  initial  positions  xp(0)  =  [0.951,  —0.852]  and  xp(0)  =  [1.265,  —1.165]  are  shown 
in  Figure  [24](b) . 


In  Cases  3,  the  matrix  A  has  one  positive  eigenvalue  and  one  negative  eigenvalue.  Hence,  the 
origin  is  a  saddle  point,  and  in  some  part  of  the  plane  the  flow  field  points  towards  the  origin  (helping 
the  pursuer),  whereas  in  other  parts  it  points  away  from  the  origin  (thus  giving  an  advantage  to 
the  evader),  as  indicated  by  the  black  vector  fields  in  Figure |24)(b).  This  suggests  that  the  pursuer 
tries  to  steer  the  game  in  the  part  of  the  space  that  the  flow  field  is  beneficial  to  him  and  the  evader 
does  the  same,  i.e. ,  tries  to  steer  the  state  to  the  parts  of  the  state  space  that  are  more  helpful  to 
him.  In  this  case,  the  game  will  terminate  (or  not)  depending  on  whether  the  pursuer  can  capture 
the  evader  before  the  latter  moves  in  the  part  of  the  space  that  the  former  has  an  advantage. 


51 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


2 

1.5 

1 

0.5 


0-/ 


/  /  / 
-i  Sy  s  s 

S  S  S’ 

-T - 1 - 

-1  -0.5 


SSS/////Z 


/  s  s  /  / 

s  S  S  /  / 


0.5  1  1.5 


/  /  /  / 

/  /  /  / 

/  /  /  / 

/  /  /  / 

/  /  /  / 

/  /  /  / 

till 
I  f  t  f 

t  f  t  f 

\  1  t  f 

\  \  1  t  1 

\  \  \  \  \ 

2  2.5  3 


(a) 


(b) 


Figure  24:  (a)  Four  valid  barriers  in  frame  M.  emanating  from  P\  through  P4,  for  A  =  [1,2;  2, 1], 
The  state  space  is  divided  by  two  shaded  capture  zones  and  two  white  escape  zones,  (b)  Optimal 
trajectories  of  the  evader  in  green  and  the  pursuer  in  dashed  magenta,  for  A  =  [1,  2;  2, 1], 


3.4  Sequential  Pursuer- Target  Assignment  Problem  Under  External  Distur¬ 
bances 

Consider  a  scenario  where  a  group  of  helicopters  or  small  UAVs  in  a  wind  field  are  trying  to  capture 
a  vehicle  moving  on  the  ground,  or  a  team  of  small  marine  or  underwater  vehicles  attempting  to 
reach  a  ship  which  is  large  enough  so  that  the  sea  currents  do  not  significantly  affect  its  motion. 
Given  such  a  group  of  pursuers,  we  want  to  find  a  pursuit  strategy  to  intercept  the  target  in 
minimum  time.  Problems  of  this  nature  fall  under  the  general  class  of  group  pursuit  problems. 
These  are  difficult  problems  to  solve,  in  general.  Their  solution  is  also  based  on  the  information  the 
pursuers  and  the  target  have  about  each  other,  resulting  in  either  cooperative  or  non-cooperative 
strategies.  In  this  work,  in  order  to  solve  this  problem,  we  propose  a  sequential  pursuit  strategy.  By 
sequential  (or  relay)  pursuit  we  mean  that  only  one  pursuer  is  assigned  to  chase  the  target  at  every 
instant  of  time.  In  addition  to  simplifying  significantly  the  group  pursuit  problem,  a  relay  pursuit 
strategy  may  be  desirable  in  cases  where  the  power  consumption  of  the  agents  is  an  important 
factor,  when  the  agents  also  play  a  dual  role  as  guardians  protecting  a  certain  area,  or  in  order  to 
account  for  possible  deceptive  strategies  of  an  opponent. 

In  contrast  to  most  standard  pursuit-evasion  problem  formulations,  where  the  effect  of  the 
environment  is  not  taken  into  consideration,  in  our  problem  setup  (only)  the  pursuers  will  be 
affected  by  known  exogenous  environmental  conditions  (e.g.,  winds  or  sea  currents).  It  will  also  be 
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assumed  that  each  pursuer  has  a  stroboscopic  view  of  the  target  position.  That  is,  each  pursuer 
knows  the  current  position  of  the  target  but  not  its  future  position  nor  its  velocity.  Our  objective 
is  to  find  which  pursuer  will  go  after  the  target  at  each  instant  of  time  so  as  to  reduce  or  minimize 
the  capture  time. 

The  main  tool  we  use  to  find  “areas  of  influence”  of  each  pursuer  are  Zernrelo-Voronoi  diagrams. 


The  Zermelo-Voronoi  Diagram 

When  we  deal  with  pursuer-target  problems,  in  many  cases  we  want  to  know  the  proximity  relation 
between  a  set  of  agents,  acting  as  pursuers,  and  a  target  on  the  plane.  The  problem  of  obtaining 
this  proximity  relation  can  often  be  recast  as  a  set  membership  problem.  For  instance,  the  question 
of  determining  which  of  the  agents  is  closest  (in  terms  of  arrival  time)  to  a  static  target  at  a 
particular  instant  of  time,  reduces  to  a  set  membership  problem,  namely,  one  of  forming  the  so- 
called  Zermelo-Voronoi  Diagram  (ZVD),  and  then  finding  the  cell  in  which  the  target  resides  at  the 
given  time  instant.  We  state  the  precise  definition  of  the  ZVD  below. 

Definition  1.  Given  a  set  of  n  agents  starting  from  distinct  initial  positions,  whose  dynamics  are 
given  by 

XjP  =  vfP  +  w{XiP,t),  XiP(0)  =  XiPQ ,  (115) 

where  Xlp  :=  [xP,yP\T  £  M2  denotes  the  position  of  the  ith  agent,  ulP  £  M2  is  the  control  input  of 
the  ith  agent,  and  w(Xp,t.)T  £  M2  represents  the  wind  disturbance,  the  Zermelo-Voronoi  diagram 
(ZVD)  (or  Zermelo-Voronoi  partition)  is  a  set  partition  of  the  plane  Z  =  {Z i,  Z<± , ...,  Zn}  such  that 

i )  M2  =  ur=i^, 

ii)  For  any  point  in  Zi,  the  agent  i  will  reach  this  point  faster  than  any  other  agent. 

The  sets  Z{  are  the  Zermelo-Voronoi  cells  for  the  partition. 

It  turns  out  that  in  the  case  when  the  wind  field  is  only  time- varying,  w(Xp,t )  =  w(t),  there 
exists  a  homeomorphism  between  the  ordinary  Voronoi  diagram  and  the  Zermelo-Voronoi  diagram 
with  the  same  generators.  This  allows  the  construction  of  the  ZVD  efficiently  using  known  VD 
algorithms  from  computational  geometry. 


Problem  Setup 

Consider  a  group  of  n  pursuers  in  the  plane,  denoted  by  the  index  set  I  =  {1,2, ...,  n},  and  assume 
that  at  time  t  =  0  the  pursuers  are  located  at  n  distinct  positions  in  the  plane,  designated  by 
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Vo  =  {Xpo  E  M2,i  E  I}.  The  kinematics  of  the  ith  pursuer,  i  6  /,  are  described  by  (115),  where 
it  is  assumed  that  ulP  E  Up,  where  the  set  Up  consists  of  all  piecewise  continuous  functions  whose 
range  is  included  in  the  set  Up  =  {u  E  M2,  \u\  ^  «}.  It  is  assumed,  furthermore,  that  there  exists 
0  <  w  <  u  such  that 

\w(t)\  ^  w,  (116) 


for  all  t  ^  0.  The  restriction  on  the  magnitude  of  the  wind  disturbance  is  imposed  in  order  to 
ensure  complete  pursuer  controllability,  namely,  that  the  pursuers  are  able  to  reach  any  point  on 
the  plane  in  finite  time. 


The  objective  of  the  pursuers  is  to  intercept  a  target,  whose  kinematics  is  given  by 


Xp  =  up,  Xp(0)  =  Xp0,  (117) 

where  Xp  =  [ xp ,  yp\J  E  M2  is  the  position  of  the  target,  and  up  E  M2  is  its  control  input  such  that 
up  E  Up.  The  set  Up  which  consists  of  all  piecewise  continuous  functions  whose  range  is  included 
in  the  set  Up  =  {u  E  M2,  |u|  ^  q}.  Note  that  the  target  is  not  affected  by  the  wind  field. 

We  assume  that  the  pursuers  have  accurate  measurements  only  of  the  current  position  of  the 
target  at  every  instant  of  time.  One  reasonable  strategy  for  every  pursuer  is  therefore  to  use  the 
Zermelo  navigation  law  in  order  to  intercept  the  target,  that  is,  at  every  instant  of  time,  the  pursuer 
approaches  the  target  with  the  control  law  obtained  by  the  solution  of  the  corresponding  Zermelo 
navigation  problem.  This  control  law  is  optimal  at  t  =  0  if  the  target  remains  stationary  for  all 
t  ^  0.  Starting  at  time  t  =  0,  the  optimal  time  of  arrival  T|N  of  the  ith  pursuer  from  X1Pq  to  Xp0 
is  given  by 

T|n  =  min{T  >  0  :  uT  -  \Xp0  -  X1Pq  -  [  w(t)  dr|  =  0}  (118) 

Jo 

Then  the  Zermelo’s  navigation  control  can  be  obtained  by 

'4n  =  IZ(cos0*,sin6>*)T,  (119) 


where 

0*  =  Arg(XTo  -  X'Po  -  /  ™  w(t)  dr) ,  (120) 

Jo 

for  i  E  I. 


It  can  be  shown  that  a  sequential  pursuit  strategy  in  which  each  pursuer  employs  the  Zermelo’s 
navigation  law  leads  to  capture  of  the  target  by  at  least  one  pursuer. 

We  may  now  propose  the  following  algorithm  to  assign  the  active  pursuers: 
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Dynamic  Assignment  of  Active  Pursuer 


a)  Construct  the  ZVD  and  assign  the  ith  pursuer  to  be  the  active  pursuer  if  the  target  resides 
in  the  corresponding  Zernrelo-Voronoi  cell  Zt. 

b)  At  every  time  step,  generate  the  ZVD  and  assign  the  jth  pursuer  to  be  the  active  pursuer  if 
the  target  resides  in  the  corresponding  Zernrelo-Voronoi  cell  Zj. 

c)  Check  the  distance  between  the  target  and  the  active  pursuer  and  repeat  step  b)  if  the 
distance  is  bigger  than  e.  Otherwise,  terminate  the  procedure  and  return  the  sequence  of 
active  pursuers. 


The  update  algorithm  to  dynamically  generate  the  ZVD  from  one  time  step  to  the  next  when 
a  single  pursuer  has  moved  makes  use  of  the  dual  of  the  VD,  namely,  its  Delaunay  Triangulation. 
In  order  to  update  the  ordinary  VD  we  will,  instead,  update  its  dual  graph.  The  algorithm  for 
updating  the  Zernrelo-Voronoi  diagram  from  the  previous  time  step  to  current  time  step  is  given 
in  Algorithm  1. 


Algorithm  1  Update  Zernrelo  Voronoi  Diagram 

Input:  Coordinates  Vk-i  of  the  generators  at  the  previous  time  step  and  the  corresponding 
Delaunay  triangulation  DT,  coordinates  Vk  of  point  set  at  current  time  step. 

Output:  Updated  Zernrelo  Voronoi  Diagram  and  Delaunay  Triangulation  at  the  current  time  step. 

1:  procedure  UPDATE_DT(DT,Pfc_i,IV) 

2:  while  the  triangulation  DT  is  not  embedded  under  current  coordinates  Vk  do 

3:  Update  DT  by  removing  one  of  the  points  that  cause  the  unenrbedding  (in  our  case  the 

active  pursuer); 

4:  store  the  current  coordinates  of  removed  points  into  set  R\ 

5:  end  while 

6:  if  R  is  not  empty  then 

7:  flip  the  remaining  triangulation  into  a  Delaunay  triangulation; 

8:  end  if 

9:  for  i  =  0  to  length(R)  do 

10:  Update  DT  by  inserting  the  ith  point  in  R  into  the  triangulation; 

11:  end  for 

12:  Transform  DT  into  an  ordinary  Voronoi  diagram  VD; 

13:  Transform  VD  into  the  ZVD  at  current  time. 

14:  return  ZVD  and  DT. 

15:  end  procedure 
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Simulation  Results 


Consider  a  scenario  where  the  target  is  moving  in  a  straight  line  according  to  equation  (117),  where 
ux(t)  =  [—0.4,  —  0.5]t.  Assume  that  there  exist  12  pursuers,  each  having  maximum  unit  speed 
(u  =  1),  which  are  initially  located  at  distinct  positions  determined  by  Vq.  The  wind  field  that 
affects  the  pursuers  is  given  by 


w{t) 


—0.2  —  0.2cos(f) 
0.3 


Figures  |25|  illustrate  the  trajectories  of  the  pursuers  in  the  wind  and  the  moving  target.  Specif¬ 
ically,  Figure  [25|(a)  shows  the  ZVD  formed  by  the  pursuers  at  t  =  0.  As  seen  in  this  figure,  i  =  4  is 
the  active  pursuer  since  the  target  falls  in  the  Zermelo-Voronoi  cell  of  Xp.  Figure  [25|(b)  illustrates 
the  trajectories  of  the  target  and  the  pursuers  in  the  time  interval  [0, 7i],  where  t\  =  2.6  is  the 
switching  time.  The  Zermelo-Voronoi  Diagram  at  t  =  T\  is  also  presented  to  show  that  the  target 

c)  shows  the 


25 


is  about  to  leave  the  Zermelo-Voronoi  cell  of  Xp  and  enter  another  cell.  Figure 
trajectories  of  the  target  and  the  pursuers  from  t  =  t\  to  capture  time  Tc  =  5.0,  as  well  as  the 
Zermelo-Voronoi  Diagram  at  t  =  TC.  In  the  last  time  interval  the  target  is  assigned  to  i  =  5. 


Figure  25:  (a)  Zermelo-Voronoi  Diagram  formed  by  pursuers  at  t  =  0,  Xp  is  the  active  pursuer,  (b) 
Zermelo-Voronoi  Diagram  formed  by  pursuers  at  the  first  switch  time  t  =  2.6,  and  trajectories  of 
pursuers  and  target  for  t  E  [0,  2.6),  (c)  Zermelo-Voronoi  Diagram  formed  by  pursuers  at  the  capture 
time  Tc  =  5.0,  and  trajectories  of  pursuers  and  target  for  t  E  [2.6,  5.0],  Xp  is  the  active  pursuer. 

For  comparison,  note  that  when  only  one  pursuer  tries  to  capture  the  target,  the  shortest 
possible  time  is  Tc  =  7.5.  In  that  case  there  is  a  single  active  pursuer,  namely,  Xp. 


56 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


3.5  Pursuit-Evasion  Games  in  Dynamic  Flow  Fields  via  Reachability  Set  Anal¬ 
ysis 

Despite  the  plethora  of  work  in  this  area,  few  approaches  have  taken  into  consideration  how  dynamic 
environmental  conditions  may  affect  the  outcome  of  the  game.  For  instance,  when  either  the  pursuer 
or  the  evader  (or  both)  is  a  small  autonomous  underwater  vehicle  (AUV)  or  small  unmanned  aerial 
vehicle  (UAV),  the  presence  of  dynamic  sea  currents  or  winds  may  significantly  affect  the  vehicle 
motion.  As  a  result,  during  the  pursuit-evasion  of  these  vehicles,  their  optimal  behavior,  as  the 
solution  of  a  differential  game,  may  be  greatly  affected  by  the  existence  of  the  external  dynamic 
flow  field. 

Some  optimal  control  problems  have  taken  into  account  the  effect  of  an  external  flow  field. 
However,  the  same  level  of  attention  has  not  been  shared  in  the  literature  for  pursuit-evasion  games 
under  the  influence  of  external  disturbances.  Here,  we  consider  a  two-player  pursuit-evasion  game 
in  an  external  dynamic  flow  field.  Due  to  the  generality  of  the  external  flow,  Issacs’  approach  does 
not  readily  yield  feasible  results.  Instead,  we  find  the  optimal  trajectories  of  the  players  through 
the  evolution  of  their  reachable  sets.  We  utilize  the  level  set  method  to  generate  the  reachable 
sets  and  retrieve  the  corresponding  optimal  controls  by  backward  propagation  of  their  respective 
reachable  sets. 


Problem  Formulation 


Consider  a  pursuit-evasion  game  in  an  external  dynamic  flow  field  with  a  single  pursuer  P  and  a 
single  evader  E.  The  dynamics  of  the  pursuer  P  is  given  by 


Xp  —  Up  +  iu(  Xp ,  f ) ,  AP(0)  —  Xp0 , 


(121) 


where  XP  =  [xP,yP]T  E  R2  denotes  the  position  of  the  pursuer,  uP  E  R2  is  the  control  input  of  the 
pursuer  such  that  uP  E  7/P.  The  set  UP  consists  of  all  piecewise  continuous  functions,  whose  range 
is  included  in  the  set  UP  =  {u  E  R2,  |it|  ^  «},  where  |  •  |  represents  the  2-norm.  If  uP  6  ifP,  we 
say  that  uP  is  an  admissible  control  of  the  pursuer.  In  (121),  w(X,t)  E  R2  represents  the  external 
dynamic  flow.  It  is  reasonable  to  assume  that  the  magnitude  of  this  flow  (e.g.  winds  or  currents) 
is  bounded  from  above  by  some  constant,  hence  there  exists  a  constant  w  such  that  \w(X,t)\  <  w, 
for  all  (X,t).  Here,  we  assume  that  the  effects  of  the  external  dynamic  flow  field  on  the  pursuer 
and  evader  are  identical. 


The  objective  of  the  pursuer  is  to  intercept  an  evader,  whose  kinematics  is  given  by 

XE  =  uE  +  vj(Xe,  t ),  Ag(0)  =  XEo,  (122) 

where  XE  =  [xE,yE]T  E  R2  is  the  position  of  the  evader,  and  uE  is  its  control  input  such  that 
uE  E  UE,  which  consists  of  all  piecewise  continuous  functions  whose  range  is  included  in  the  set 
UE  =  {v  E  R2,  |u|  ^  v}.  Again,  uE  is  an  admissible  control  of  the  evader  if  uE  E  UE. 
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The  game  begins  at  the  initial  time  to  =  0  with  initial  positions  XEo,  XPo,  and  terminates  when 
the  pursuer  reaches  the  location  of  the  evader,  that  is,  there  exists  a  terminal  time  T  such  that 
XP(T)  =  Xe(T).  Specifically,  the  game  terminates  if  there  exists  an  admissible  control  uP  £  UP 
of  the  pursuer  such  that  XP(T)  =  XE(T)  for  some  finite  time  T  >  0,  regardless  of  any  admissible 
control  of  the  evader  uE  £  Z7S. 


Problem  Analysis 


Reachable  Sets.  A  reachable  (or  attainable)  set  at  a  given  time  is  defined  as  the  set  of  points 
that  can  be  visited  by  the  agent  at  a  particular  time.  Note  that,  in  our  problem,  the  pursuer  can 
pick  its  speed  within  the  range  of  [0,  u\.  When  the  maximum  external  flow  magnitude  is  smaller 
than  the  pursuer’s  speed,  the  set  of  reachable  points  at  a  given  time  is  identical  to  the  set  of  points 
the  pursuer  can  reach  within  that  time.  If  the  external  flow  is  at  some  time  and  locations  larger 
than  the  speed  of  the  pursuer,  the  reachable  set  is  the  set  of  points  the  pursuer  can  reach  at  that 
time.  The  same  holds  for  the  evader.  In  all  cases,  the  boundary  of  the  reachable  set  is  called 
the  reachability  front.  In  particular,  the  reachable  set  of  the  pursuer  at  time  t  >  0,  denoted  by 
7 ZP(XPQ,t),  is  the  set  of  all  points  IgK2  such  that  there  exists  a  trajectory  satisfying  (115),  with 


initial  position  XPo  and  terminal  position  X  at  time  t.  Similarly,  the  reachable  set  7 ZE(XEo,t)  of 
the  evader  at  time  t  >  0  is  the  set  of  all  points  Y  £  M2  such  that  there  exists  a  trajectory  satisfying 


(122),  with  initial  position  XEo  and  terminal  position  Y  at  time  t.  The  reachability  fronts  of  the 
pursuer  and  the  evader  at  time  t  >  0  are  denoted  by  dlZP(XPo,t )  and  <97 ZE(XBo,t),  respectively. 
We  denote  by  lZ*E(XEo,t )  the  usable  reachable  set  of  the  evader,  which  is  the  set  of  all  terminal 
points  (at  time  t)  of  admissible  trajectories  of  the  evader  that  do  not  pass  through  the  reachable 
set  of  the  pursuer  at  any  time  in  the  interval  [0,  t] .  In  other  words,  1ZE(XEo,t)  is  the  set  of  terminal 
points  of  all  the  ‘safe’  evader  trajectories. 

Proposition  3.  Let  T  =  inf{t  £  M  :  7 Zf,(XEo,t)  =  0}.  If  T  <  oo,  then  capture  is  guaranteed 
for  any  time  greater  than  T ,  while  the  evader  can  always  escape  within  a  time  smaller  than  T . 
That  is,  T  is  the  optimal  time-to-capture.  Moreover,  let  Xf  denote  the  location  where  the  evader  is 
captured  by  the  pursuer.  Then  Xf  lies  on  the  intersection  of  the  reachability  front  of  the  pursuer 
dIZP(XPo,T)  and  the  reachability  front  of  the  evader  dIZE(XEo,T) . 

Remark  2.  In  most  cases,  such  as  when  u  >  v,  the  safe  reachable  set  of  the  evader  1ZE(XEo,t) 
satisfies 

'R-*Ei.XEo,t)  =  IZE(XEo,  t)\IZP(XPo,  t),  (123) 

for  all  t  >  0.  In  such  cases,  the  condition  IZ*E(XEoit)  =  0  is  equivalent  to  the  condition  IZE(XEo,t)  C 
lZP(XPo,t).  Then,  the  optimal  time-to-capture  is  the  first  time  when  the  reachable  set  of  the  pursuer 
TZP(XPo,t )  completely  covers  the  reachable  set  of  the  evader  lZE(XEo,t).  When  u  <  v  (the  relative 


maximum  speed  of  the  evader  is  larger  than  the  maximum  speed  of  the  evader)  the  relation  ( 123 ) 
may  not  always  hold-some  admissible  evader  trajectories  may  temporarily  enter  the  reachable  set  of 
the  pursuer  and  exit  later.  In  these  cases,  IZ*E(XEo,t )  can  be  determined  by  treating  the  reachable 
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set  of  the  pursuer  as  a  dynamic  ‘forbidden’  region  for  the  evader.  However,  the  examples  shown  in 
this  paper  use  u  >  v,  which  allows  us  to  exploit  (123). 


Proposition [3] is  valid  under  the  assumption  that  capture  is  guaranteed  for  some  finite  time.  We 
provide  a  sufficient  condition  for  this  to  occur  in  the  next  theorem. 

Theorem  3.  Assume  w(X,t)  satisfies  the  triangle  inequality  and  its  norm  is  bounded  from  above 
by  a  constant  X,  where  A  <  u  —  v.  Then  the  game  terminates  in  finite  time  regardless  of  the  initial 
positions  between  the  pursuer  and  the  evader.  Furthermore,  the  time-to- capture  satisfies  the  upper 
bound 

T  <  \X_Eo  Xp°\.  (124) 

u  —  v  —  A 


Level  Set  Method.  In  order  to  construct  the  forward  reachable  sets  of  the  pursuers  and  the 
evader,  we  utilize  the  level  set  method.  The  level  set  method  is  a  convenient  tool  to  track  the 
evolution  of  the  reachability  front.  It  evolves  an  interface  (front)  by  embedding  it  as  a  hyper¬ 
surface  in  a  higher  dimension,  where  time  is  the  augmented  dimension.  Automatic  handling  of 
merging  and  pinching  of  fronts  and  other  topological  changes  are  made  possible  by  such  higher 
dimensional  embedding.  Level  sets  provide  an  implicit  representation  of  the  front,  which  offers 
several  advantages  over  an  explicit  representation. 

For  any  c  E  M,  the  c- level  set  of  a  function  0  :  M2  — >  M  is  the  set  {X  e  M2|0(A)  =  c}.  We 
consider  the  signed  distance  function 

{min  \X  —  Y\,  if  A  is  outside  the  front, 

Y£&R  (125) 

—  min  \X  —  T|,  if  A  is  inside  the  front. 

YedK 

The  signed  distance  function  is  one  of  the  most  commonly  used  implicit  functions  in  level  sets.  It 
is  smooth  and  monotonic  across  the  interface.  It  also  keeps  fixed  amplitude  gradients  in  the  held. 
For  all  A  6  dlZ,  we  have  0(A)  =  0.  That  is,  the  zero  level  set  implicitly  represents  the  reachability 
front.  Moreover,  the  reachable  set  can  be  represented  by  {A"  £  R2|0(A)  <  0}. 

The  reachability  front  dlZP(XPo,t )  of  the  pursuer  is  governed  by  the  viscosity  solution  of  the 
Hamilton-Jacobi  (HJ)  equation 

_|_  u  |V0P|  +  w(X,  t)  •  V0P  =  0,  (126) 

with  initial  conditions  0P( A,  0)  =  |A  —  APo|.  Moreover,  the  reachable  set  of  the  pursuer  coincides 
with  the  region(s)  where  0P  is  non-positive.  Similarly,  the  reachability  front  dlZE(XEo,t )  of  the 
evader  is  given  by  the  HJ  equation 

dMXH)  +^|V0£|+w(A,f)-V0g  =  O,  (127) 
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with  initial  conditions  4>E(X,  0)  =  \X  —  XEo 


In  the  case  when  v  >  u,  we  need  to  track  the  propagation  of  dlZ%(XEo,t).  Its  reachability  front 

wing  modified  version  of  the  Harnil 

+  Ht)  I  V</>*  |  +  w(X,  t )  •  V</>*  =  0, 


can  be  computed  by  solving  the  following  modified  version  of  the  Hamilton- Jacobi  equation  (127): 

dc/)*E{X,  t ) 


dt 


where 


v(t)  = 


v ,  if  (j)P( X,  t)  >  0, 
u,  otherwise. 


(128) 


(129) 


The  main  idea  is  to  propagate  1Z*E  (XEo ,  t)  with  the  maximum  speed  of  the  evader  v  when  it  is 
outside  the  reachable  set  of  the  pursuer,  and  to  keep  pace  with  the  propagation  of  dlZP(XPo,t ) 
when  the  front  of  the  evader  enters  the  reachable  set  of  the  pursuer  to  make  sure  that  it  never 
grows  out  of  the  reachable  set  of  the  pursuer  again.  Note  that  lZ%(XEo,t)  is  represented  by  the 
area  {X  €  M2|^J(Jf,  t)  <  0  and  <j>P(X,t)  >  0}. 


Time-Optimal  Paths. 


The  location  Xf  where  the  evader  is  captured  by  the  pursuer  is  the  intersection  between  dlZP(XPo,T) 
and  dlZE(XEo,T).  After  we  have  identified  the  (common)  terminal  position  of  the  pursuer  and  the 
evader,  we  can  retrieve  the  optimal  trajectories  and  optimal  controls  of  both  players  by  backward 
propagation  along  the  reachable  sets.  In  particular,  the  time-optimal  trajectories  X*  and  X*  satisfy 
the  following  differential  equations,  when  cj>p  and  4>e  are  differentiable: 

+w(X*,t),  (130) 

I V  (f>p\ 

+  w{X*,t).  (131) 

Hence,  the  corresponding  time-optimal  controls  of  the  pursuer  and  the  evader  are 


dx; 

dt 

d XI 
dt 


u 


* 

P 


_  V(/>P 

U\^P 


*  -Vi i>E 

=  V-, 


|V^ 


(132) 


Simulation  Results 

In  this  section,  we  apply  our  method  to  a  problem  with  a  more  realistic  representation  of  the  flow 
field.  In  particular,  we  consider  a  wind  field  approximation  generalized  from  the  Rankine  model  of 
vortex: 

ns 

w(X)  =  w0  +  ^2uiAi(X  -  xSi),  (133) 

i=l 
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where 


OJi  = 


maxjrf.,  \\X  —  ccSi ||2} 


(134) 


In  (133)  ns  is  the  number  of  flow  singularities,  xSi  is  the  location  of  the  i-th  flow  singularity 


and  rSi  denotes  the  singularity  radius.  is  a  2  x  2  matrix,  whose  structure  captures  the  local 
characteristics  of  the  z-th  flow  singularity.  The  model  approximates  the  velocity  field  of  a  vortex 
with  a  linear  vector  field  inside  a  disk  and  the  velocity  outside  of  the  disk  decreases  as  the  inverse 
squared  distance  to  the  center  of  the  disk. 


For  our  numerical  simulation,  we  set  the  number  of  flow  singularities  to  ns  =  3.  The  locations 
of  the  flow  singularities  are  xsi  =  [3,2]T,  xS2  =  [ — 3, 4] T ,  xS3  =  [0,  —  5]T,  and  the  corresponding 
radii  are  rSl  =  3,  rS2  =  2,  rS3  =  3,  respectively.  The  local  wind  field  matrices  are  given  by 


A  i 


0  0.3 

-0.15  0  ’ 


d-2 


0.4  0.2 

0  -0.2  ’ 


^3 


0.2  0.1 

-0.2  0.2 


We  also  choose  wq  =  [0,  ()]. 


The  reachable  fronts  of  the  pursuer  and  the  evader  at  the  terminal  time  are  shown  in  Fig¬ 
ure  a) .  The  corresponding  optimal  trajectories  of  the  pursuer  and  the  evader  are  shown  in 
Figure  [26|(b).  Expansion  of  the  level  sets  and  the  backward  propagation  step  to  find  the  optimal 
paths  are  depicted  in  Figures [27{a)  and [27](b)  for  the  pursuer  and  the  evader,  respectively. 


3.6  Min-Max  Differential  Dynamic  Programming:  Continuous  and  Discrete 
Time  Formulations 

Differential  game  theoretic  or  min-max  formulations  are  important  extensions  of  optimal  control 
having  direct  connections  to  robust  and  H°°  nonlinear  control  theory.  While  there  has  been  a  lot 
of  work  on  DDP  algorithms  and  applications,  most  of  the  work  is  on  discrete  time  formulations 
and  for  cases  where  there  are  no  disturbances.  Given  all  this  existing  work  ,  our  contribution 
in  this  paper  is  the  derivation  of  min-max  DDP  in  continuous  time  and  its  comparison  with  the 
discrete  time  formulation.  In  particular,  we  provide  a  set  of  backward  differential  equations  for 
both  continuous  and  discrete  time  that  are  easy  to  implement  and  derive  the  optimal  policies  for 
the  two  players/controllers.  We  compare  the  continuous  and  discrete  time  formulations  in  terms 
of  convergence  and  numerical  efficiency.  In  addition  we  investigate  the  effect  that  the  min-max 
formulation  has  in  the  feed-forward  and  feedback  parts  of  the  optimal  control  policies. 
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X 


(a) 


(b) 


Figure  26:  (a)  Red  and  blue  curves  represent  the  reachable  fronts  of  the  pursuer  and  the  evader 
at  the  terminal  time,  respectively.  They  intersect  at  Xf,  where  the  pursuer  captures  the  evader 
eventually,  (b)  Optimal  trajectories  of  the  pursuer  and  the  evader  in  red  and  dotted  blue  lines, 
respectively,  generated  from  the  reachable  set  approach.  The  wind  field  is  depicted  in  the  back¬ 
ground. 


Game  Theoretic  Differential  Dynamic  Programming  in  Continuous  Time 

On  the  differential  game  theoretic  setting,  we  have  the  following  min-max  problem: 

R(x(to),  to)  =  minmax  tf)  +  J  £(x,  u,  v,  t)dt (135) 

subject  to  the  dynamics 


dx  =  F(x,  u,  v,  i)dt,  (136) 

where  V  stands  for  the  performance  index,  x(t)  is  an  n-dimensional  vector  function  of  time  describ¬ 
ing  the  state  of  the  dynamic  system  at  t  E  [0,f/].  C  and  4>  are  scalar  functions  of  their  arguments, 
where  £(x,  u,  v,t)  is  the  running  cost  and  (j>(x(tf),tf)  is  the  terminal  cost,  u  is  an  m-dimensional 
vector  function  that  represents  the  stabilizing  control  of  the  system,  whose  objective  is  to  minimize 
the  performance  index.  Whereas  v  being  a  ^-dimensional  vector  represents  the  destabilizing  control 
of  the  system,  that  tries  to  maximize  the  performance  index. 

In  continuous  time,  the  analysis  starts  with  the  Hamilton-Jacobi-Bellman  Isaacs  (HJBI)  partial 
differential  equation.  More  precisely,  we  have: 
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(a)  (b) 

Figure  27:  (a)  Optimal  trajectory  of  the  pursuer  in  red  dots  overlaid  on  intermediate  reachability 
front  contours  in  blue,  (b)  Optimal  trajectory  of  the  evader  in  red  dots  overlaid  on  intermediate 
reachability  front  contours  in  blue. 


8V(x,t) 

dt 


=  min  max  |£(x,  u,  v,  t)  +  V^(x,  t)TF(x,  u,  v,  t)\, 

U  V  L  J 


with  boundary  condition 


V(x,tf)  =  </>(x(tf),tf). 


(137) 

(138) 


Given  an  initial/nominal  trajectory  of  the  state  and  control  (x,  u,  v),  and  letting  5x  =  x  —  x, 
du  =  u  —  u,  <5v  =  v  —  v,  the  linearized  dynamics  can  be  represented  as 


dx 

dt 

ddx 

dt 


F(x  +  dx,  u  +  <5u,  v  +  <5v,  t ), 
=  Fx6x  +  Fu<5u  +  Fv5v, 


(139) 

(140) 


where  hFx,  Fu,  and  Fv  stand  for  Fx(x,  u,  v,  t),  Fu(x,  u,  v,  t ),  Fv(x,  u,  v,  t),  respectively.  Henceforth, 
the  arguments  for  the  functions  V,  F,  etc,  are  omitted  for  brevity  unless  otherwise  specified,  and  a 
bar  7  is  attached  when  they  are  evaluated  at  the  nominal  trajectory  (x,  u,  v). 


The  main  idea  here  is  to  take  expansions  of  the  terms  in  both  sides  of  the  equation  ( 137)  around 
the  nominal  state  and  control  trajectories  (x,  u,  v)  to  derive  the  update  law  for  the  stabilizing 
control,  destabilizing  control  and  backward  differential  equations  for  the  zeroth,  first  and  second 
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order  approximation  terms  of  the  value  function.  After  dornr  mathematical  manipulations,  we 
obtain 


<5u*  =  -QUu(Qux<5x  +  Quv5v  +  Qu), 
dv*  =  -QvviQvxfo  +  Qvu6u  +  Qv), 


where  Qvu  =  Quv>  and 


Qx  =  4T1 4  +  Ac, 

Qxx  =  ^XX  A  2  \  xx  f  x . 
Qux  =  A’lfhxx  +  £UX) 


Qv  =  Fjvx  +  £v, 


Qu  =  Fjvx  +  £u, 

Quu  —  22uu,  Qw  —  22VV) 

Qvx  —  Fv  Vxx  T  22VX)  Quv  —  Fuv 


(141) 

(142) 


(143) 


Notice  that  <5v  is  still  in  the  previous  expression  of  hu*.  We  need  to  replace  the  5v  term  in 


(141)  with  (142)  and  solve  for  <5u*.  Similarly,  we  can  solve  for  5v*.  The  final  expressions  for  hu* 


and  hv*  are  specified  as  follows: 


u*  =  lu  +  Ku5x  and  5v*  =  lv  +  K  v5x, 

v,  lu  and  feedback  gains  Kv,  Ku  defined  as: 

(144) 

( Quu  QuvQwQvu)  (Qu  QuvQvvQv)? 

(145) 

(Qw  QvuQuuQuv)  (Qv  QvuQuuQu)? 

(146) 

—  (Quu  QuvQvvQvu)  (Qux  QuvQvvQvx)) 

(147) 

(Qw  QvuQuuQuv)  (Qvx  QvuQuuQux)- 

(148) 

The  next  step  is  to  substitute  the  optimal  control  (141)  and  disturbance  (destabilizing  control) 


(142)  to  the  HJBI  equation  (137)  in  order  to  find  the  update  law  of  the  value  function  and  its  first 


and  second  order  partial  derivatives.  After  collecting  terms  as  zeroth  order,  first  order  and  second 
order  expressions  of  <5x,  we  can  equate  the  coefficients  of  hx  and  readily  obtain  the  backward 
propagation  equations  with  respect  to  the  value  function  and  its  first  and  second  order  partial 
derivatives.  These  backward  differential  equations  are  expressed  as  follows 

- -jj-  =  F  +  luQu  +  lvQv  +  “luQuulu  +  ljQuvlv  +  “lvQvvlv, 


_  dhx 
dt 

dT4x 

(It 


=  QX  +  K  lQu  +  K  TQv  +  Qlxlu  +  QTxW  +  KTQUU1U  +  KjQuvlv 
+  KVQVU1U  +  Kv  Qvvlv, 

=  2  KlQux  +  2  RTqvx  +  2KTQVUKU  +  K^QUUKU  +  K^QVV  Kv  + 


(149) 
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Game  Theoretic  Differential  Dynamic  Programming  in  Discrete  Time 


In  the  discrete-time  approach,  the  problem  is  discretized  along  the  time  interval.  Similar  to  the 
HJBI  equation  in  continuous-time  case,  our  analysis  in  discrete  time  starts  with  a  variation  of  the 
Bellman  equation  for  the  min-max  case  with  u (4)  and  v(4): 


V(x(4),4)  =  min  max 

u (4)  v(tfe) 


L(x(4),u(4),  v(4),4)  +  V{jc(tk+i),tk+i) 


(150) 


Consider  again  the  dynamical  system  dx  =  F(x,  u,  v)d t.  The  linearized  dynamics  model  around 
x(tfc),  u(tfc),  v(tfc)  in  discrete  time  can  be  written  as  5x(4+i)  =  A (4)<5x(4)  +  Bu(4)<Tu(4)  + 
B v(tk)5v(tk),  where  A (4)  =  I  +  fx(4)di,  Bu(4)  =  fu(4)di  and  Bv(4)  =  fv(4)dt  .  8x(tk)  = 
x(4)  —  x(4),  8u(tk)  =  u (tk)  —  u (tk)  and  5v(4)  =  v(4)  —  v(4)  are  defined  as  the  deviations  from 
the  nominal  trajectory. 

Let  Q(x(4),u(4),v(4))  =  L(x(4),u(4),  v(4),4)  +  T(x(4+i),4+i)-  We  obtain 

5u(tk)*  =  -Quu(Qux<5x(4)  +  Quv8v(tk)  +  Qu), 

i  /  ,  (151) 

Sv(tk)  =  -Qvv(Qvx5x(tfc)  +  Qvu5u(t.k)  +  Qv), 


where  Qvu  =  QuV>  an(l 

Qx  =  I4(4+i)TA(4)  +  Lx(4),  Qu  =  Px(4+i)TBu(4)  +  Lu(tk), 

Qv  =  Px(4+i)tBv(4)  +  Lv(4), 

Qxx  =  A(4)  Pxx(4+i)A(4)  t  LXx(4))Quu  =  Bu(4)  Pxx(4+i)Bu(4)  T  Luu(4), 

Qvv  =  (4)TPxx  (4+i)Bv(4)  T  Z/Vv 

Qxu  =  A(4)  Txx(4+i)Bu(4)  T  Lxu(4),  Qxv  =  A(4)  Pxx(4+i)Bv(4)  -j-  Lxv(t/C), 

Quv  =  Bu(4)  Pxx(4+i)Bv(4)  t  Luv(4). 


Solving  the  system  of  equations  in  (151)  results  in  the  expressions 


5u(4)*  =  lu  +  Kudx(4),  <5v(4)*  =  lv  +  Kv<fx(4), 


where 

Tu  =  -(Quu  -  QuvQvvQvu)_1(Qu  -  QuvQvvQv), 

Tv  =  -(Qvv  -  QvuQuuQuv)  1  (Qv  -  QvuQuuQu), 
Ku  =  -(Quu  -  QuvQvvQvu)"1(Qux  -  QuvQvvQvx), 
Kv  =  -(Qvv  -  QvuQuuQuv)_1(Qvx  -  QvuQuuQux)- 


(153) 

(154) 

(155) 

(156) 

(157) 
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By  plugging  the  optimal  control  updates  Su(tk)*  and  6v(tk)*  into  the  value  function,  we  can  split 
the  value  function  into  zero,  first  and  second  order  terms  in  dx(tk)  such  that  V(x(tk)  +  Sx(tk))  = 
V(tk)  +  Vx.(tk)T 5x(tk)  +  lSx(tk)TVlcx(tk)8x(tk),  where  V(tk),  14(4)  and  Vx x(4)  are  computed  as 

V  (4)  =  V  (tjk+l)  +  f^Qu  +  T^Qv  +  “O^Quuiu  +  f^QvvTv  +  l^QuiJv  +  fjQviJu)i 

14(4 )  =  Qx  +  K^QU  +  KyQv  +  QxJu  +  Qxviv  +  K^QuuTu  +  KyQvvIv  +  K^QUvTv  +  KvQviJui 

Ixx(4)  =  Qxx  +  K„XQUX  +  KyQv  +  Qxu^u  +  Qxv4  +  K^QUUKU  +  K^QVVKV  +  K^QuvKv  +  KvQvuKu 

(158) 

Since  the  optimal  cost-to-go  is  computed  backward  in  time,  this  computational  scheme  is  called 
the  backward-sweep  in  trajectory  optimization. 

The  boundary  conditions  at  t  =  tf  for  the  backward  propagations  are 

V{tf)  =  <j>(x(tf),tf), 

Vx(tf)  =  <£x(x(f/),f/),  (159) 

Vxx(tf)  =  </>xx(x(f/),  tf). 


Comparison  between  Continuous  and  Discrete  GT-DDP 

Besides  the  form  of  the  backward  differential  equations,  one  of  the  major  differences  between  the 
discrete  and  continuous  time  formulations  is  on  the  specification  of  the  terms  Quu  and  Qvv.  In 
the  continuous  case  these  terms  are  specified  by  £uu  and  £vv  and  therefore  they  are  completely 
specified  by  the  user.  This  is  not  the  case  with  the  discrete  time  formulation  of  min-max  DDP 
in  which  the  terms  QUu  and  Qvv  are  also  functions  of  I4xj  besides  £uu  and  £vv  •  The  result  of 
this  observation  is  that  for  the  discrete  time  case  the  positive  definiteness  of  Quu  and  the  negative 
definiteness  of  Qvv  along  the  nominal  trajectories  are  not  guaranteed.  This  is  not  the  case  with 
the  continuous  time  formulation  of  GT-DDP  and  therefore  the  continuous  version  is  numerically 
more  stable  that  the  discrete  time. 

In  terms  of  running  time  comparison  of  the  two  algorithms,  continuous  time  GT-DDP  requires 
the  usage  of  differential  equation  solvers,  while  only  arithmetic  calculations  are  needed  in  discrete 
time  GT-DDP.  Therefore,  the  time  cost  per  iteration  is  faster  in  the  discrete  time  GT-DDP  case, 
where  each  iteration  is  given  by  an  update  of  the  controls. 


Examples 

In  this  subsection,  we  first  apply  our  algorithms  on  some  minimax  problem,  then  we  demonstrate 
how  our  algorithms  can  also  be  utilized  to  bring  some  systems  to  their  goal  states  under  stochastic 
disturbances. 
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(a)  Cost  per  iteration  under  discrete  GT-(b)  Cost  per  iteration  under  continuous  GT- 

DDP.  DDP. 


Figure  28:  Comparison  in  cost  per  iteration. 


Inverted  Pendulum  Problem.  We  first  apply  our  discrete  and  continuous  algorithms  on  the 
inverted  pendulum  with  conflicting  controls.  In  particular,  the  dynamics  is  given  by  19  +  bd  — 
mg£  sin  9  =  u  +  v,  where  the  parameters  are  chosen  in  the  simulations  as  m  =  1kg,  i  =  0.5m, 
b  =  0.1,  /  =  ml2 ,  g  =  9.81kg  •  m/sec2.  Our  goal  is  to  bring  the  pendulum  from  the  initial  state 

[M  =  M]to[M  =  [o,o]. 


We  cast  this  problem  as  a  trajectory  optimization  problem.  The  cost  function  is  given  by 

rtf 


J  = 


10 


(xiQx  +  u1I?uu  —  vlRvv), 


(160) 


where  x  =  [9,  6} 1 ,  Q  = 


1,  0 
0,  0.1 


and  Ru  =  0.01,  Rv  =  1. 


We  set  the  initial  control  to  be  u  =  0,  v  =  0  and  the  terminal  time  to  be  tf  =  1.  The  multiplier 
7  =  0.8  in  the  continuous  case.  Initial  controls  and  the  corresponding  initial  trajectories  of  the 
states  are  identical  in  both  cases. 


We  run  the  algorithms  for  30  iterations  and  the  convergence  is  achieved  in  both  cases.  The 
two  algorithms  yield  the  same  results  in  terms  of  optimal  trajectories  and  corresponding  optimal 
controls,  as  expected.  As  can  be  seen  in  Figure  [28^,  and[28]b,  the  cost  converges  in  10  iterations  in 
the  discrete  case.  On  the  other  hand,  4  iterations  are  needed  for  the  convergence  of  the  cost  in  the 
continuous  case. 


Quadrotor.  The  dynamic  model  of  the  quadrotor  includes  16  states:  3  for  the  position  (r  = 
(x,  y,  z)T),  3  for  the  Euler  angles  (<!>  =  (0,  9,  V’)T))  3  for  the  velocity  (r  =  (x,  y,  i)T),  3  for  the  body 
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angular  rates  (<I>  =  (p,  q,  r)T)  and  4  for  the  motor  speeds  (H  =  (aq,  u>2,  UI3,  W4)T).  The  corresponding 
dynamics  of  the  quadrotor  with  conflicting  controls  is  given  as  follows: 

cix 

—  = /(x) +  G(u- v),  (161) 

where  x  =  [r,  <L,  r,  <F,  P]T  E  M16,  and  u  =  (iq,  U2,  U3, 114)^  E  M4  is  the  stabilizing  control  vector, 
where  u\  represents  the  thrust  force,  and  112,  U3, 114  represent  the  pitching,  rolling,  yawing  moments, 
respectively.  ui,  112,113  and  U4  are  deviations  from  the  nominal  vector  (uh,  0, 0, 0)T.  v  E  M4  denotes 
the  destabilizing  control.  The  corresponding  cost  function  is  defined  as  J  =  \  f(l)f  [(x  —  p)TQ(x  — 
p)  +  u1  i?uu  —  vr/?vv]  +  |(x  —  p)TQf(x  —  p),  where  p  E  M16  denotes  the  desired  terminal  states. 
The  state  dependent  running  cost  is  included  for  better  convergence.  In  the  simulation,  we  set 


p(i) 


27 r,  i  =  5; 

0,  otherwise, 


(162) 


and  Q  =  0.01  Qf.  Ru  =  0.0001/  and  Rv  =  0.0005/.  7  =  0.5. 

The  desired  terminal  state  p  is  chosen  such  that  the  quadrotor  would  change  its  pitch  angle  for 
27t  and  return  to  its  original  hovering  position.  50  iterations  are  included  to  ensure  the  convergence. 
The  corresponding  optimal  state  trajectories  are  shown  in  Figure  [29] 


3.7  Solution  of  Stochastic  Differential  Games  using  Forward  and  Backward 
Stochastic  Differential  Equations 

In  this  part  of  the  work,  we  focused  our  research  on  designing  an  efficient  algorithm  to  numerically 
solve  a  large  class  of  stochastic  differential  game  problems.  The  algorithm  is  a  sampling-based 
scheme  which  relies  on  the  theory  of  forward  and  backward  stochastic  differential  equations  (FB- 
SDEs)  and  their  connection  to  backward  PDEs.  In  particular,  we  first  obtain  a  probabilistic 
representation  of  the  solution  to  the  Hamilton  Jacobi  Isaacs  (HJI)  PDE,  expressed  in  the  form  of 
a  system  of  FBSDEs.  This  system  of  FBSDEs  is  then  simulated  by  employing  linear  regression 
techniques.  Since  the  HJI  PDE  appears  in  both  stochastic  differential  games  and  risk-sensitive 
optimal  control  problems,  the  proposed  scheme  is  applicable  to  both  types  of  stochastic  optimal 
control  formulations. 


Problem  Statement 

Let  (f2,  J^",  {^t}t>o,  T)  be  a  complete,  filtered  probability  space  on  which  a  p-dimensional  standard 
Brownian  motion  Wt  is  defined,  such  that  {^t}t> 0  is  the  natural  filtration  of  II7  augmented  by  all 
P-null  sets.  Consider  the  game-theoretic  setting  in  which  the  expected  game  payoff  is  defined  by 
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Figure  29:  Optimal  trajectories  of  the  states  in  blue.  Dashed  red  lines  represent  the  desired  terminal 
states. 


the  functional 


T 

P(t,xt-,u(-),v(-))  =  E  g(xT)  +  j  [q{t,xt) +  \ul Rut- \vj Qvt]dt 


(163) 


associated  with  the  stochastic  controlled  system,  which  is  represented  by  the  Ito  stochastic  differ¬ 
ential  equation  (SDE) 


dxt  =f(t ,  xt)dt  +  G(t,  xt)utdt  +  L(t,  xt)vtd t  +  £(t,  xt)dWt,  t  E  [r,  T\, 
x(t)  =  xT, 


(164) 


where  T  >  r  >  0,  T  is  a  fixed  time  of  termination,  x  E  Mn  is  the  state  vector,  u  E  is  the 
minimizing  control  vector,  and  v  E  is  the  maximizing  control  vector.  Furthermore,  R  and  Q 
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are  respectively  v  x  v  and  [x  x  g,  positive  definite  matrices,  g  :  Mn  — >  R,  q  :  [r,  T]  x  Rn  — >•  R, 
/  :  [0,  T]  x  I"  4  Rn,  G  :  [0,  T\  x  Rn  -»  Rnxu,  L  :  [0,  T]  x  K"  ->  Rnx^  and  E  :  [0,  T]  x  Rn  -»  RnxP 
are  deterministic  functions,  that  is,  they  do  not  depend  explicitly  on  w  G  fl.  We  assume  that  all 
standard  technical  conditions  which  pertain  to  the  filtered  probability  space  and  the  regularity  of 
functions  are  met,  in  order  to  guarantee  existence,  uniqueness  of  solutions  to  (164),  and  a  well 
defined  payoff  (163).  These  impose,  for  example,  that  the  functions  g,  q,  /,  G,  L  and  £  are 
continuous  w.r.t.  time  t  (in  case  there  is  explicit  dependence),  Lipschitz  (uniformly  in  t)  with 
respect  to  the  state  variables,  and  satisfy  standard  growth  conditions  over  the  domain  of  interest. 
Furthermore,  the  square-integrable  processes  u  :  [0,  T]  x  fi  — >  U  C  Ru  and  v  :  [0,  T]  x  11  — >  V  C  R^ 
are  o-adapted,  which  essentially  translates  into  the  control  inputs  being  non-anticipating, 

i.e. ,  relying  only  on  past  and  present  information. 


The  intuitive  idea  behind  the  game-theoretic  setting  is  the  existence  of  two  players  of  conflicting 
interests.  The  first  player  controls  u  and  wishes  to  minimize  the  payoff  P  over  all  choices  of  v, 
while  the  second  player  wishes  to  maximize  P  over  all  choices  of  u  of  his  opponent.  At  any  given 
time,  the  current  state,  as  well  as  each  opponents’  current  control  action  is  known  to  both  players. 
Furthermore,  instantaneous  switches  in  both  controls  are  permitted,  rendering  the  problem  difficult 
to  solve  in  general. 


The  Value  Function  and  the  HJI  Equation 


For  any  given  initial  condition  (r,  xr),  we  investigate  the  game  of  conflicting  control  actions  u,  v  that 


minimize  (47)  under  all  admissible  non- anticipating  strategies  assigned  to  tt(-),  while  maximizing 
it  over  all  admissible  non-anticipating  strategies  assigned  to  v(-).  For  the  structure  imposed  on 
this  problem  by  the  form  of  the  cost  and  dynamics  at  hand,  the  Isaacs  conditioi^]  holds,  and  the 
payoff  is  a  saddlepoint  solution  to  the  following  terminal  value  problem  of  a  second  order  partial 
differential  equation,  known  as  the  Hamilton- Jacobi-Isaacs  (HJI)  equation,  which  herein  takes  the 
form 


Vt  +  inf  sup 
ueU 


V(T,  x)  =  g(x),  xe 


^tr(I4 x££T)  +  Vj (/  +  Gu  +  Lv)  +  q  +  uT Ru  —  ^vTQv  >  =  0,  (t,  x )  E  [0,  T)  x 


(165) 

In  the  above,  function  arguments  have  been  suppressed  for  notational  compactness,  and  Vx  and 
Vxx  denote  the  gradient  and  the  Hessian  of  V,  respectively.  The  term  inside  the  brackets  is  the 
Hamiltonian.  For  the  chosen  form  of  the  cost  integrand,  and  assuming  that  the  optimal  controls 


lie  in  the  interiors  of  U  and  V,  we  may  carry  out  the  infimum  and  supremum  operations  in  (165) 


explicitly  by  taking  the  gradient  of  the  Hamiltonian  with  respect  to  u  and  v  and  setting  it  equal 


1Tlie  Isaacs  condition  renders  the  viscosity  solutions  of  the  upper  and  lower  value  functions  equal,  thus  making 
the  order  of  minimization/maximization  inconsequential. 
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to  zero,  and  therefore,  for  all  (t,x)  G  [0,  T]  x  Mn,  the  optimal  controls  are  given  by 


u*(t,x)  =  —R  1GT(t,  x)Vx(t,  x),  (166) 

v*(t,  x)  =  Q_1LT(t,  x)Vx(t,  x).  (167) 

Inserting  the  above  expression  back  into  the  original  HJI  equation  and  suppressing  function  argu¬ 
ments  for  notational  brevity,  we  obtain  the  equivalent  characterization 

Vt  +  ltr(I4r£XT)  +  Vjf  +  q-  1  Vj  (gR~1Gt  -  LQ_1LT^  Vx  =  0,  (t,  x )  G  [0,  T)  x  Mn, 

V(T,x)=g(x),  x  G  Mn. 

(168) 


A  Feynman-Kac  Representation  through  FBSDEs 


There  is  a  close  relationship  between  stochastic  differential  equations  and  second-order  partial 
differential  equations  (PDEs)  of  parabolic  or  elliptic  type.  Specifically,  solutions  to  a  certain  class 
of  nonlinear  PDEs  can  be  represented  by  solutions  to  forward-backward  stochastic  differential 
equations  (FBSDEs),  in  the  same  spirit  as  demonstrated  by  the  well-known  Feynman-Kac  formulas 
for  linear  PDEs.  We  begin  by  briefly  reviewing  FBSDEs. 

As  a  forward  process  we  shall  define  the  square-integrable,  {^s}s>o-adapted  process  A(-)j^} 
which,  for  any  given  initial  condition  (t,x)  G  [0,  T\  x  Mn,  satisfies  the  Ito  FSDE 


dXs  =  b(s,Xs)ds  +  Y{s,Xs)dWs,  s  G  [t,T\, 
Xt  =  x. 


(169) 


The  forward  process  (169)  is  also  called  the  state  process  in  the  literature.  We  shall  denote  the 
solution  to  the  forward  SDE  (169)  as  Xl'x ,  wherein  (t,x)  are  the  initial  condition  parameters. 


In  contrast  to  the  forward  process,  the  associated  backward  process  is  the  square-integrable, 
{^s}s>o-adapted  pair  (K(-),  Z(-))  defined  via  a  BSDE  satisfying  a  terminal  condition 


dFs  =  -h(s,  Xa,  Ys,  Zs)ds  +  Zj dWs  s  G  [t,  T ], 
Yt  =  g(XT). 


(170) 


The  function  h(- )  is  called  the  generator  or  driver.  The  solution  is  implicitly  defined  by  the  initial 
condition  parameters  (t,x)  of  the  FSDE  since  it  obeys  the  terminal  condition  g(x!f,x).  We  will 
similarly  use  the  notation  Yg'x  and  Zl’x  to  denote  the  solution  for  a  particular  initial  condition 
parameter  (t,  x)  of  the  associated  FSDE. 

2While  X  is  a  function  of  s  and  co,  we  shall  use  Xs  for  notational  brevity. 
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While  FSDEs  have  a  fairly  straightforward  definition,  in  the  sense  that  both  the  SDE  and 
the  filtration  evolve  forward  in  time,  this  is  not  the  case  for  BSDEs.  Indeed,  since  solutions  to 
BSDEs  need  to  satisfy  a  terminal  condition,  integration  needs  to  be  performed  backwards  in  time 
in  some  sense,  yet  the  filtration  still  evolves  forward  in  time.  It  turns  out  that  a  terminal  value 
problem  involving  BSDEs  admits  an  adapted  (i.e. ,  non-anticipating)  solution  if  we  back-propagate 
the  conditional  expectation  of  the  process,  that  is,  if  we  set  Ys  =  E[Y^|^S]. 


Notice  that  the  forward  SDE  does  not  depend  on  Ys  or  Zs.  Thus,  the  resulting  system  of 
FBSDEs  is  said  to  be  decoupled.  If,  in  addition,  the  functions  b,  E,  h  and  g  are  deterministic,  in 
the  sense  that  they  do  not  depend  explicitly  on  w  £  H,  then  the  adapted  solution  (Y,  Z )  exhibits 
the  Markovian  property;  namely,  it  can  be  written  as  deterministic  functions  of  solely  time  and  the 
state  process: 


Theorem  4.  (The  Markovian  Property)  -  There  exist  deterministic  functions  V(t,x)  and  d(t,x'^ 
such  that  the  solution  (Yt,x,Zt,x)  of  the  BSDE  (170)  is 


Y**  =  V{s,Xl’x),  =  YT(s,X^)d(s,X^), 


(171) 


for  all  s  G  [t,  T] . 


We  now  proceed  to  state  the  nonlinear  Feynman-Kac  type  formula,  which  links  the  solution  of 
a  class  of  PDEs  to  that  of  FBSDEs.  Indeed,  the  following  theorem  can  be  proven  by  an  application 
of  Ito’s  formula: 


Theorem  5.  (Nonlinear  Feynman-Kac)  -  Consider  the  Cauchy  problem 

(Vt  +  |tr (VxxYYt)  +  Vj b(t,  x)  +  h(t,  x ,  V,  YtVx)  =  0, 
1  (t,x)  G  [0,T)  x  Mn,  V{T,x)=g{x),  x  G  Mn, 


(172) 


wherein  the  functions  E,  b,  h  and  g  satisfy  mild  regularity  conditions.  Then  (172)  admits  a  unique 
(viscosity)  solution  V  :  [0,  T]  x  Mn  — >  M,  which  has  the  following  probabilistic  representation: 


V(hx)=Ytt’x,  V(t,  x)  G  [0,  T]  x 


(173) 


wherein  (X(-),  Y(-),  Z(-))  is  the  unique  adapted  solution  of  the  FBSDE  system  (169)-(170).  Fur¬ 
thermore, 

(Yst,x,  Zg,x)  =  fvfaX**),  ST(s,XtsnVx(s,Xts ■*)),  (174) 

for  all  s  G  [t,T],  and  if  (172)  admits  a  classical  solution,  then  (173)  provides  that  classical  solution. 


llBy  abuse  of  notation,  here  (t,x)  are  symbolic  arguments  of  the  functions  V  and  d,  and  not  the  initial  condition 
parameters  as  in  (Yt,x ,  Zt,x).  Throughout  this  work,  it  should  be  clear  from  the  context  whether  (t,x)  are  to  be 
understood  as  initial  condition  parameters  or  symbolic  arguments. 
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A  careful  comparison  between  equations  (168)  and  (172)  indicates  that  the  nonlinear  Feynman- 


Kac  representation  can  be  applied  to  the  H JI  equation  given  by  ( 168 )  under  a  certain  decompos- 
ability  condition,  stated  in  the  following  assumption: 

Assumption  2.  There  exist  matrix-valued,  functions  T  :  [0,  T]  x  Mn  — >  and  B  :  [0,  T]  x  Mn  — » 

Mpx/i  such  that  G(t,x)  =  E(f,x)r(t,  x)  and  L(t,x )  =  E(t,x)B(t,x)  for  all  (f,x)  G  [0,  T]  x  Mn. 

This  assumption  implies  that  the  range  of  G  and  L  must  be  a  subset  of  the  range  of  E,  and 
thus  excludes  the  case  of  a  channel  containing  control  input  but  no  noise,  although  the  converse  is 


allowed.  Under  this  assumption,  the  HJI  equation  given  by  (168)  becomes 


Vt  +  ±tr(VxxZZT)  +  Vjf  +  q-  IVJ E  (tR^T7  -  BQ^B^  ErVx  =  0,  (t,  x)  G  [0,  T) 


V(T,x)=g(x ),  xG 


(175) 
nth 

(176) 


in  which  function  arguments  have  been  suppressed,  and  which  satisfies  the  format  of  (172)  with 

b(t,x)  =  f(t,x), 

and 

h(t,  x,  z )  =  q(t,  x)  —  \z T  (r(t,  x)i7'1rT(t,  x)  —  B(t,  x)Q^1  BT (t,  x))z.  (177) 


We  may  thus  obtain  the  (viscosity)  solution  of  (175)  by  simulating  the  system  of  FBSDE  given  by 


(169)  and  (170).  Notice  that  (169)  corresponds  to  the  uncontrolled  (u  =  0,  v  =  0)  system  dynamics. 


Connection  to  Risk-Sensitive  Control 


The  connection  between  dynamic  games  and  risk-sensitive  stochastic  control  is  well-documented  in 
the  literature.  Specifically,  the  optimal  controller  of  a  stochastic  control  problem  with  exponentiated 
integral  cost  (a  so-called  risk-sensitive  problem)  turns  out  to  be  identical  to  the  minimizing  player’s 
unique  minimax  controller  in  a  stochastic  differential  game  setting.  Indeed,  consider  the  problem 
of  minimizing  the  expected  cost  given  by 


J(r,  xr;  u(-))  =  e  In E<(  exp - 


9{xt)  +  J  q(t,xt)  +  Rut  d t 
where  e  is  a  small  positive  number.  The  state  dynamics  are  described  by  the  Ito  SDE 


(178) 


d xt  =  f(t,  xt)dt  +  G(t,  xt)utdt  + 
t  G  [r,T],  x(t)  =  xT. 


2y 


;E(t,Xt)dWt, 


(179) 
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Suppressing  function  arguments  for  notational  compactness,  the  associated  Hamilton- Jacobi-Bellman 
PDE  for  this  problem  is 


^  "■nT  \  I  t/T  /  f  I  I  „  I  ,.T  —  ‘ 


Vt  +  inf  <!  -^tr^EE  1  )  +  Vx'  (f  +  Gu)  +  q  +  ul  Ru  +  ttTVx  \  =  0,  (i, x)  G  [0, T)  x 

«eW  (  4qz  4qz  J 

V(T,x)  =  g(x),  x  G  Mn. 

(180) 

The  infimum  operation  can  be  performed  explicitly,  and  yields  the  optimal  control  u*(t,x )  = 
— R~1GT(t,  x)Vx(t,  x).  Setting  E  =  y4/272E  and  substituting  the  optimal  control  in  the  PDE 
( 180 )  we  readily  obtain  the  equivalent  characterization 


Vt  +  ltr(H^EET)  +  Vj  f  +  q-\VJ  f 


GR~1Gt  -  -EE  1  )14  =  0, 


(t,x)  G  [0,  T)  x 


V{T,x)=g(x),  x  E 


(181) 

The  above  equation  is  merely  a  special  case  of  equation  ( 168 )  obtained  for  the  game-theoretic 
version,  if  one  substitutes  Q  =  (1/e)/  and  L  =  E.  Notice  that  this  special  case  of  L  automatically 
satisfies  Assumption  1  with  B  being  the  identity  matrix.  Thus,  imposing  the  same  decomposability 
condition  on  G,  the  solution  to  the  risk-sensitive  stochastic  optimal  control  problem  can  be  obtained 
by  simulating  the  system  of  FBSDEs  given  by  (169)  and  (170)  using  the  definitions  (176)  and  (177). 


Approximating  the  Solution  of  FBSDEs 

The  solution  of  FBSDEs  has  been  studied  to  a  great  extent  independently  from  its  connection  to 
PDEs,  mainly  within  the  field  of  mathematical  finance.  Though  several  generic  schemes  exist,  in 
this  work  we  employed  a  modification  which  exploits  the  regularity  present  in  FBSDEs  that  arise 
from  the  application  of  the  nonlinear  Feynman-Kac  lemma. 

We  begin  by  selecting  a  time  grid  {t  =  to  <  ...  <  tjv  =  T}  for  the  interval  [t,T],  and  denote 
by  A U  =  tj+i  —  U  the  (i  +  l)-th  interval  of  the  grid  (which  can  be  selected  to  be  constant)  and 
AW)  =  Wti+1  —  Wti  the  ( i  +  l)-th  Brownian  motion  increment.  For  notational  brevity,  we  also 
denote  Xi  =  X^.  The  simplest  discretized  scheme  for  the  forward  process  is  the  Euler  scheme, 
which  is  also  called  Euler- Maruyama  scheme: 

f  Xi+i  ~  Xi  +  b(ti,  Xi)At-i  +  S(tj,  Xi)AWi, 

\<  =  i,...,jv,  Xo  —  x.  (182) 

Several  alternative,  higher  order  schemes  exist  that  can  be  selected  in  lieu  of  the  Euler  scheme. 
To  discretize  the  backward  process,  we  further  introduce  the  notation  Y)  =  and  Zi  =  Ztt. 
Then,  recalling  that  adapted  BSDE  solutions  impose  Ys  =  E[Y)|^g]  and  Zs  =  E[Zsl^s]  (i.e.,  a 
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back- propagation  of  the  conditional  expectations),  we  approximate  equation  (170)  by 
Yi  =  E[lj|  ~  IE[li+i  +  h(ti+\,  Xi+\,  li+i,  Zi+i)Ati\Xi\. 


(183) 


Notice  that  in  the  last  equality  the  term  Zj  Aik)  in  (170)  vanishes  because  of  the  conditional 


expectation  (A Wi  is  zero  mean),  and  we  replace  with  A*  in  light  of  the  Markovian  property 


presented  earlier.  By  virtue  of  equation  (174),  the  Z-process  in  (170)  corresponds  to  the  term 
ST(s,  Xl’x)vx(s,  Xl’x).  Therefore  we  can  write 


Zi  =  E[Zt \&u]  =  E[ET(ii,  A^V^fe,  A^A*] 
=  T,t (ti,  Xi)V xv(ti,  Xi), 


(184) 


which  naturally  requires  knowledge  of  the  solution  at  time  0  on  a  neighborhood  x,  v(ti,x).  The 
backpropagation  is  initialized  at 

Yt  =  g(XT),  ZT  =  E(T,  XT)TVxg(XT),  (185) 

for  a  g(-)  which  is  differentiable  almost  everywhere.  There  are  several  ways  to  approximate  the 


conditional  expectation  in  (183),  however  in  this  work  we  shall  employ  the  Least  Squares  Monte 


Carlo  (LSMC)  method,  which  we  shall  briefly  review  in  what  follows. 

The  LSMC  method  addresses  the  general  problem  of  numerically  estimating  conditional  ex¬ 
pectations  of  the  form  E[A|A]  for  square  integrable  random  variables  X  and  Y,  if  one  is  able  to 
sample  M  independent  copies  of  pairs  (A,  Y) .  The  method  itself  is  based  on  the  principle  that  the 
conditional  expectation  of  a  random  variable  can  be  modeled  as  a  function  of  the  variable  on  which 
it  is  conditioned  on,  that  is,  E[A|A]  =  0*(A),  where  0*  solves  the  infinite  dimensional  minimization 
problem 

0*  =  arg min E [|0(A)  -  A|2],  (186) 

<t> 

and  0  ranges  over  all  measurable  functions  with  E[|0(A')|2]  <  oo.  A  finite-dimensional  approx¬ 
imation  of  this  problem  can  be  obtained  if  one  decomposes  0(-)  ~  Vi{')ai  =  with 

<p(-)  being  a  row  vector  of  k  predetermined  basis  functions  and  a  a  column  vector  of  constants, 
thus  solving  a*  =  argminaeRfe  E[|</?(A)a  —  Y |2],  with  k  being  the  dimension  of  the  basis.  Finally, 
this  problem  can  be  simplified  to  a  linear-least  squares  problem  if  one  substitutes  the  expectation 
operator  with  its  empirical  estimator,  thus  obtaining 


M 


of  =  arg  min  -J-  ^  |^(AJ) 


*eRfc  M  -f  , 

3= 1 


a 


-Yi  I 


(187) 


wherein  (A ] ,YJ),  j  =  1 , . . .  ,M  are  independent  copies  of  (A,  Y).  Introducing  the  notation 

X*1) 


<L(A)  = 


^(A 


M\ 


6 


nMxk 


(188) 
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the  solution  to  this  least-squares  problem  can  be  obtained  by  directly  solving  the  normal  equation, 
i.e., 

a*  =  f$T(X)$(X)^  $T(V)| 


Yl 


Y 


M 


(189) 


or  by  performing  gradient  descent.  The  LSMC  estimator  for  the  conditional  expectation  assumes 
then  the  form  E|Y|X  =  x\  =  <j)*(x)  «  <p(x)ci*. 

Returning  to  our  problem,  we  may  apply  the  LSMC  method  to  approximate  the  conditional 


expectation  in  equation  (183)  for  each  time  step.  To  this  end,  we  require  a  vector  of  basis  functions 


( p  for  the  approximation  of  E[Y)|Aj].  Although  the  basis  functions  can  be  different  at  each  time  step, 
we  shall  use  the  same  symbol  for  notational  simplicity.  Then,  Monte  Carlo  simulation  is  performed 
by  sampling  M  independent  trajectories  {A]”}j=i jy,  in  which  the  index  m  =  1, . . . ,  M  specifies 
a  particular  Monte  Carlo  trajectory.  Whenever  this  index  is  not  present,  the  entirety  with  respect 
to  this  index  is  to  be  understood.  The  numerical  scheme  is  initialized  at  the  terminal  time  T 
and  is  iterated  backwards  along  the  entire  time  grid,  until  the  starting  time  instant  has  been 
reached.  At  each  time  step  tj,  we  are  given  M  pairs  of  data  ( ,  X"1  on  which  we  perform 
linear  regression  to  estimate  the  conditional  expectation  of  as  a  function  of  x  at  the  time  step 
ti .  This  provides  us  an  approximation  of  the  Value  function  v  at  time  L  for  the  neighborhood 
of  the  state  space  that  has  been  explored  by  the  sample  trajectories  at  that  time  instant,  since 
v(U,x )  =  E[Y|Xj  =  x]  «  ip(x)ai .  We  then  replace  Y™  =  E[Y)m|XJ"]  «  (p(X™)oti,  thereby  treating 
the  conditional  expectation  as  a  projection  operator.  Finally,  the  approximation  of  the  conditional 
expectation  of  Zi  is  obtained  by  taking  the  gradient  with  respect  to  x  on  v(ti,x),  evaluating  it  at 
X™,  and  scaling  it  with  E 

«E(L,vr)Tv^(vr>*.  uqo) 

Concluding  one  iteration,  this  process  is  repeated  for  fj_ i, . . . ,  t\.  Note  that  this  approach  requires 
the  basis  functions  ip(-)  of  our  choice  to  be  differentiable  almost  everywhere,  so  that  ViT<£>(x)  is 
available  in  analytical  form  for  almost  any  x.  The  proposed  algorithm  is  then  summarized  as 


Initialize  :  YT  =  g( XT),  ZT  =  E(T,  XT)  1  Vxg(XT), 

1 


on  =  arg  mm  — 
a  M 


&(Xi)a  —  +  Atih(ti+i,  Xj+i,  Y+i,  Zi- |_i) 

Yi  =  HXi)ai ,  Z™  =  E(ti,  XDTVMXT)<Xi, 


(191) 


where  m  =  1  and  the  matrix  d?  defined  in  (188).  Again,  the  minimizer  in  (191)  can  be 

(192) 


obtained  by  directly  solving  the  normal  equation,  i.e., 

-l 

sT  / 


at  =  I  T  1  (W)T(W)  T  1  (. Xi )  Yi+i  +  AUh(ti+1,  Xi+1,  Yi+1,  Zi+1) 


4Hcre,  Yi m  denotes  the  quantity  V+i  +  Xtih(ti+i,  NJ+ii  V+ij  %i+i)i  which  is  the  >7™  sample  value  before  the 
conditional  expectation  operator  has  been  applied. 
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or  by  performing  gradient  descent.  The  essential  algorithm  output  is  the  collection  of  Oj’s,  that  is, 
the  basis  function  coefficients  at  each  time  instant,  which  are  needed  to  recover  the  Value  function 
approximation  for  the  particular  area  of  the  state  space  that  is  explored  by  the  forward  process. 


Simulation  Results 


To  evaluate  the  algorithm’s  performance,  two  simulations  were  performed  on  scalar  systems,  for 
which,  owing  to  their  simplicity,  we  have  the  opportunity  to  evaluate  the  system  behavior. 

A  Linear  System  Example:  The  first  example  used  is  a  scalar  linear  system  for  which  the 
analytic  solution  can  be  recovered.  Specifically,  for  a  very  high  maximizer  control  weight  Q ,  we 
expect  the  solution  to  be  almost  identical  to  the  LQR  solution,  which  is  available  in  closed  form. 
We  simulate  the  algorithm  for  dx  =  (0.2x  +  u  +  0.5xv)dt  +  0.5du>,  with  q(t,x )  =  0,  R  =  2,  x(0)  =  1, 
T  =  1  and  g(xT)  =  40x^,  thus  penalizing  deviation  from  the  origin  at  the  time  of  termination,  T. 
For  Q,  the  maximizing  control  cost  factor,  we  selected  varying  values  ranging  from  5  to  50,000.  In 
the  latter  case,  we  expect  to  recover  the  LQR  coefficients.  For  the  purposes  of  comparison  with 
the  closed  form  solution,  the  set  of  basis  functions  for  Y  was  selected  to  be  [1  x  x2]T.  For  the  LQR 
controller,  the  coefficients  correspond  to  the  basis  functions  [1  x2]T.  Two  thousand  trajectories 
were  generated  on  a  time  grid  of  At  =  0.004.  Fig.  30  shows  that,  indeed,  for  very  hight  values  of  Q 


the  algorithm  recovers  the  correct  theoretic  LQR  coefficients,  while  Fig.  [3T|  depicts  simulations  for 
the  case  in  which  the  maximizing  control  is  allowed  to  act  on  the  system  when  it  is  relatively  cheap. 
In  this  case,  we  can  see  that  because  the  maximizer  has  enough  control  authority,  the  equilibrium 
has  moved  away  from  the  desired  value  of  x(T )  =  0,  as  expected. 

A  Nonlinear  System  Example:  To  demonstrate  that  the  scheme  can  accommodate  nonlin¬ 
earity  in  the  dynamics,  we  applied  the  algorithm  on  the  system  dx  =  (4cosx+u-|-0.5xx)dt+0.5du;. 
The  drift  was  replaced  by  a  nonlinear  term  to  introduce  an  additional  behavior  to  the  open-loop 
system  trajectories.  The  results  are  depicted  in  Fig.  [32j  From  the  shape  of  the  value  function  in 


Fig.  32  b)  it  is  seen  that  the  value  is  relatively  flat  at  the  beginning  since  there  is  no  state-dependent 


running  cost  and  becomes  progressively  quadratic  at  the  final  time  owing  to  the  boundary  condition 


V(T,xt )  =  40xy-  Note,  however,  that  Fig.  32(b)  shows  the  value  function  over  a  rectangular  grid. 


In  fact,  we  have  an  accurate  estimate  of  the  value  function  only  over  the  area  of  the  state  space 
visited  by  the  sampled  (open-loop)  trajectories.  In  that  sense,  the  areas  not  visited  by  the  system 
are  extrapolated  based  on  the  basis  functions  chosen  to  represent  V. 
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Coefficients 

Simulation  Of  the  Controlled  System  Y- coefficients  Z:  Scaled  Gradient  of  Y  (weights) 


Time  t 


(a)  (b) 

Figure  30:  Simulation  of  the  system  with  very  high  maximizing  control  cost  weight  Q  =  50,  000.  (a) 
Controlled  trajectories  (red)  vs.  uncontrolled  (blue),  (b)  Y  and  Z  coefficients,  compared  to  those  obtained 
by  the  closed  form  solution  of  the  LQR  if  the  maximizing  control  was  not  present  (black  dashed  lines). 
We  observe  that  for  a  high  maximizing  control  cost,  the  obtained  coefficients  match  those  of  the  LQR  as 
expected. 
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Simulation  of  the  Controlled  System 


Figure  31:  Simulation  of  the  system  with  small  maximizing  control  cost  weight  Q  =  5.  (a)  Controlled 
trajectories  (red)  vs.  uncontrolled  (blue),  (b)  Y  and  Z  coefficients,  compared  to  those  obtained  by  the  closed 
form  solution  of  the  LQR  if  the  maximizing  control  was  not  present  (black  dashed  lines). 
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Coefficients 


Figure  32:  Simulation  of  the  nonlinear  system  with  small  maximizing  control  cost  weight  Q  =  5.  (a) 
Controlled  trajectories  (red)  vs.  uncontrolled  (blue),  (b)  The  Value  function,  (c)  Y  coefficients. 
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